Pervasive Parallelism Lunch, 2016-17 Series
All Pervasive Parallelism Lunches start at 12:15pm, and take place every Wednesday in the Informatics Forum Mini-Forum 2 [IF-4.40], unless otherwise noted.
For details about PPar Lunch logistics, please see the main Pervasive Parallelism Lunch Programme page.
*NOTE* students and speakers are free to swap dates with fellow organizers/speakers if unavailable on the assigned date. Please inform the PPar Administrator of any changes.
|Date||Student Organizer||Speaker||Title / Abstract|
|28-Sep-16||Justs Zariņš||Rajkarn Singh||Optimizing Utility in Multiple Query Scenario under Differential Privacy
With the advancement of smart device technologies and easy Internet connectivity, there is a gigantic upsurge in the availability of users? personal information. Although being very useful, data may contain sensitive user information and hence needs to be protected from adversaries. Differential privacy (DP) is developed as a means to achieve this goal along with providing useful data statistics. DP achieves privacy by providing results in the form of a distribution over the query outcome.
In this talk, I’ll describe in detail the notion of DP and the mechanism used to achieve it. I’ll also briefly talk about the correlated query scenario with in which case the performance of DP degrades because of independent result distributions.
|05-Oct-16||Amna Shahab||Murray Cole||Introducing Cohort 3|
|12-Oct-16||Jakub Zalewski||Daniel Hillerström||Runtime Agnostic Concurrency with Handlers
In the idealised world we would “write once, use everywhere”. This is every hard to achieve in practice, and in particular when we find ourselves in a parallel setting. Simplifying a bit, we can divide the work on parallelisation into two camps: the automatic parallelisation camp which is often inhabited by compiler community and the all manual parallelisation camp which is typically inhabited by the HPC community. With the continuous emergence of new parallel architecture it is increasingly difficult to make everything automatic. Conversely, coding everything by hand allows programmers to bake domain-specific knowledge directly into the application, but this often comes at the cost of modularity. My work is positioned in the “all manual camp”, but I bring a set of new, modular tools. In this work I make novel use of algebraic effects and handlers to decouple the concrete concurrency implementation from the concurrency model that my applications use. Consequently, this allows me to modularly pick any suitable instantiation of my model. NB! This is very early work-in-progress.
|19-Oct-16||Christos Vasiladiotis||Hugh Leather||CompuCast|
|02-Nov-16||Christos Vasiladiotis||Mark Parsons||Exascale Computing
HPC is target driven and the next key target is the development and installation of the first Exascale computers in the early to mid-2020s. These computers will exhibit extreme parallelism of at least 100 million parallel threads and consume more than 25MW of power.
This short talk will look at where we are today and where we need to get to in order to use such systems.
Exascale is not just a hardware challenge but also a major software challenge. None of today’s applications scale to this level of parallelism – a major challenge for parallel computing.
|09-Nov-16||Victor Ivanov||Vanya Yaneva||Compiler-Assisted Test Acceleration Using GPUs
Software is found everywhere in our everyday life – in all of our mobile devices, in our cars, in our hospitals, in our workplaces. A lot of this software is in the form of embedded applications for smart sensors. The safety and correctness of this software is of paramount importance. Testing is a critical part of the develpment process, having led to the wide adoption of Test Driven Development as a standard software practice. However, rigorous testing requires the regular execution of huge test suites, which can be extremely time consuming, as well as costly in terms of testing infrastucture, maintenance and energy consumed. We propose looking at exploiting GPUs for executing tests in parallel.
In this talk I will present our proposed approach in some detail. With the help of a clang-based code generation tool, OpenCL kernels for the tested application are automatically generated and the tests are executed on the GPU threads. I will also present the speedup achieved when using this approach on programs from the EEMBC benchmark suite and discuss some of the work that is left for the future.
|23-Nov-16||Paul Metzger||Wenfei Fan||Parallelizing Sequential Graph Computations
This talk presents GRAPE, a parallel system for graph computations. GRAPE differs from previous systems in its ability to automatically parallelize existing sequential graph algorithms, without the need for recasting the algorithms into a new model. Underlying GRAPE are a simple programming model and a principled approach, based on partial evaluation and incremental computation. We show that sequential graph algorithms can be “plugged into” GRAPE and get parallelized. As long as the sequential algorithms are correct, their GRAPE parallelization guarantees to terminate with correct answers under a monotonic condition. Moreover, algorithms in MapReduce, BSP and PRAM can be optimally simulated on GRAPE. In addition to the ease of programming, GRAPE achieves comparable performance to the state-of-the-art graph systems.
This is joint work with Jingbo Xu, Yinghui Wu, Wenyuan Yu, Jiaxin Jiang, Zeyu Zheng, Bohan Zhang, Yang Cao and Chao Tian.
|30-Nov-16||Ludovic Capelli||Philip Ginsbach||Automatic Identification and Parallelisation of General Reduction Operations
Discovering and exploiting reduction parallelism has been studied for many years but no current approach scales well beyond simplistic scalar reductions with low arithmetic intensity. We developed a new approach that automatically detects a wide class of performance critical reductions in established benchmark suites. This approach is based on a constraint formulation of the reduction idiom and has been implemented as an LLVM pass. We use a customized constraint solver to identify program subsets that adhere to the constraint specification.
Once discovered, we automatically generate parallel code to exploit the reduction. This approach is robust and was evaluated on C versions of three well known benchmark suites: NAS, Parboil and Rodinia. We detected 84 scalar reductions and 6 histograms, outperforming existing approaches and show that exploiting histograms gives significant performance improvement.
|08-Dec-16 (THURSDAY)||Administrator & Volunteers||PPar Lunch Deluxe||End of semester social.|
Title / Abstract
|11-Jan-17||Vasilis Gavrielatos||Larisa Stoltzfus||Performance, Portability and Productivity for Room Acoustics Simulations
As Moore’s Law loses relevance and the parallel computing landscape expands, it is becoming increasingly difficult for computational scientists to write performant portable simulations targeting new architectures without rewriting their codes. Many of the relevant advancements this domain are often tested out only on simple benchmarks, which can create gaps in the utility of new frameworks for more complicated scientific codes. My research approaches this problem from the bottom up, by developing a solution specific to the application domain at hand: room acoustics. Room acoustics simulations model sound in 3D spaces and are useful in fields such as virtual reality and musical composition. In this talk I will discuss my work towards developing a performant, portable and productive room acoustics model using the Lift framework, which will be applicable for other wave-based simulations.
|18-Jan-17||Paul Metzger||Garrett Morris||Safe, Expressive, and Effectful
This talk will summarize my recent work on integrating linear typing and functional programming, explore the relevance of linear typing to problems in domains such as concurrency and systems programming, and discuss the use of my approach to express (as libraries) other approaches to effectful programming, such as region types, ownership types, or adoption and focus.
|25-Jan-17||Victor Ivanov||Daniel Mills||Quantum Computing (some interesting bits and bobs)
During this talk I will try to explain some of the ways in which quantum computing is different from classical computing and some of the algorithms and ideas which best exhibit this separation. I will also end with some of the work from my masters dissertation which explored how we can test for the amount of quantum power a computer holds.
|01-Feb-17||Federico Pizzuti||[no speaker]||PPar Social Lunch|
|08-Feb-17||Federico Pizzuti||Mark Miller (alumnus)||Mark will discuss his experience in the field, outlining the lessons learned and provide students to ask any questions as they consider their paths post-PPar.|
|15-Feb-17||Lewis Crawford||Paul Piho||Controlling Collective Adaptive Systems
In this talk I will introduce a class of systems called collective adaptive systems, give some reasons why they are interesting, and talk about some of my currently planned work on gaining insight on how to effectively control such systems.
|22-Feb-17||Lewis Crawford||Nick Brown||ePython: An implementation of Python for the many-core Epiphany co-processor
The Epiphany is a many-core, low power, low on-chip memory architecture and one can very cheaply gain access to a number of parallel cores which is beneficial for HPC education and prototyping. The very low power nature of these architectures also means that there is potential for their use in future HPC machines; however, there is a high barrier to entry in programming them due to the associated complexities and immaturity of supporting tools. I will talk about ePython, a subset of Python for the Epiphany and similar many-core co-processors. Due to the limited on-chip memory per core we have developed a new Python interpreter and this, combined with additional support for parallelism, has meant that novices can take advantage of Python to very quickly write parallel codes on the Epiphany and explore concepts of HPC using a smaller scale parallel machine. The high level nature of Python opens up new possibilities on the Epiphany and other similar architectures that the community have already started to adopt and use to explore concepts of parallelism and HPC.
|08-Mar-17||Rudi Horn||n/a||PPar Social Lunch|
|15-Mar-17||Antonis Katsarakis||Floyd Chitalu||Hardware Adaptive Simulation of Physical Deformation and Interaction Using GPUs
Finite Element Methods (FEM) are a well known technique to solve elasticity problems with their ability to provide a continuous description of the underlying physical equations. The associated advantages include the ability to accurately represent deformation while using only a few parameters. In this talk, will briefly summarize my research on physics-based simulation of elastic deformation and collision detection. I will first provide brief overview of my Masters.
work on GPU based real-time Continuous Collision Detection and then discuss current research on the topic of simulating geometrically non-linear elastic deformation at interactive rates using model reduction and GPUs
|22-Mar-17||Antonis Katsarakis||Vashti Galpin||CARMA: A quantitative formal language for modelling collective adaptive systemsIn this talk, I’ll introduce the language CARMA which has been developed as part of the QUANTICOL EU project. This project investigates the quantitative modelling of collective adaptive systems. I’ll give a very brief history of various languages that led to CARMA, highlight keys feature of CARMA itself and talk about three different models we have developed in CARMA relating to ambulance deployment, pedestrian movement and food security.|
|29-Mar-17||Pepijn Kokke||Caoimhin Hugh Laoide-Kemp||Streaming Communication Patterns for Multidimensional Domain Decomposition Methods
One of the most significant challenges faced by parallel computing is that of efficient communication between processors. This is becoming more and more important as the industry advances towards exascale computing, as many current algorithms will not scale to the required numbers of processors. One such algorithm is halo exchange – a standard communication pattern for problems that are decomposed by domain. This talk will discuss some of the problems with using halo exchange on exascale machines, and also explore the advantages of a stream-based alternative. It will cover the topic of multidimensional decompositions and the challenges they pose to both streaming and exchange patterns.
|Pepijn Kokke||Stefan Fehrenbach (affiliate student)||Compiling Query DSLs over Complex Objects to Spark
Assume you have a query over complex data. It’s lots of data, so you want to distribute execution. What do you do? Write Spark code. What, by hand? No. You compile your query to Scala code that calls Spark APIs for you, obviously. Bonus points for a compiler written in Coq. That’s what I did at IBM last summer.
|12-Apr-17||Vasileios Gavrielatos||Myrto Arapinis|
|19-Apr-17||Ludovic Capelli||Chris Huenen||Can quantum theory be characterized in terms of information-theoretic constraints?
Does information play a significant role in the foundations of physics? We investigate whether information-theoretic constraints characterize quantum theory. In a C*-algebraic framework, this is known to hold via three equivalences: no broadcasting and noncommutativity; no bit commitment and nonlocality; no signalling and kinematic independence. But this complex linear framework could be said to build in quantum theory from the start. We show that the first two equivalences break, and the third holds, in a framework of generalized, possibly nonlinear, C*-algebras. This framework of monoidal categories covers arbitrary concurrent computations.
|26-Apr-17||Thomas Wright||Ludovica Luisa Vissat (affiliate student)||Modelling and analysis of spatio-temporal properties of stochastic systems
In this talk I will present a novel formal framework to model and study spatio-temporal properties of stochastic models of ecological systems. This formal framework consists of a new stochastic process algebra MELA, for Modelling in Ecology with Location Attributes, and suitable techniques to analyse properties expressed through suitable spatio-temporal logics. The aim of MELA is to provide ecologists with a straightforward yet flexible modelling tool, allowing the modeller to create stochastic, individual-based and spatially explicit models. Using stochastic simulations describing the dynamic evolution of the systems, we perform statistical model checking to analyse properties of the models, formally expressed through the recently proposed Signal Spatio-Temporal Logic (SSTL). I will explain how this has been extended to widen the analysis of stochastic systems, introducing a case-study to show interesting examples of applications of this new logic.
|03-May-17||Rudi Horn||Thomas Wright||Communication, computers, and the chemistry of life
What do chemicals, creatures, and computers all have in common? Communication. For a species that spends most of our time messaging, emailing, tweeting, poking, and just talking, we give relatively little thought to what communication really is. In this talk I will attempt to remedy this by introducing you to process algebra, a mathematical theory which captures the key algebraic features of communication, and abstracts away all of the annoying social and ethical details. We will then see how everything a computer can do can be accomplished by talking alone, how to model competition of species as communication, learn what language chemicals speak, and how to talk ourselves into knots.
|10-May-17||Rodrigo Rocha||Amna Shahab||Rethinking On-chip Caches for CMP Servers
Multicore servers dedicate close to half of their chip area for a large cache pool. On-chip caches prevent frequent slow off-chip memory accesses making them essential for good system performance. Cache capacity requirements grow in line with the growing application working set sizes. This presents a two-fold problem: on-chip communication delay increases with cache size making cache accesses slow, and secondly the available cache capacity is rendered insufficient. Commercial workloads are sensitive to both cache size(on account of their large working sets) and cache access latency(on account of limited memory level parallelism). There is a need for large caches without compromising on access latency. We propose to employ vertical integration to provide large cache capacities very close to the operating cores for fast accesses. In this talk, I will present a logical arrangement for on-chip stacked DRAM caches which minimizes access latency while still providing large capacities per core.
|17-May-17||Rodrigo Rocha||n/a||PPar Social Lunch|
|24-May-17||Thomas Wright||n/a||PPar Social Lunch|
|31-May-17||Rodrigo Rocha||Christos Vasiladiotis||A hybrid commutativity-based analysis framework for detecting parallelism
Parallelizing compilers have mostly exhausted any benefits of static analyses, largely based on dependence, used to detect parallelism in sequential legacy source code. Using a hybrid static-dynamic framework based on commutativity and
|07-Jun-17||Naums Mogers||PPar Lunch Deluxe||End of semester lunch series wrap-up/social.|