Pervasive Parallelism Lunch, 2017-18 Series
We gather for Pervasive Parallelism Lunches on Wednesdays from 12:15pm, with the talk starting at 12:30pm sharp, in Informatics Forum Mini-Forum 2 [IF-4.40], unless otherwise noted.
For details about PPar Lunch logistics, please see the main Pervasive Parallelism Lunch Programme page.
*NOTE* students and speakers are free to swap dates with fellow organizers/speakers if unavailable on the assigned date. Please inform the PPar Administrator of any changes.
|Date||Student Organizer||Speaker||Title / Abstract|
|27-Sep-17||Chris Vasiladiotis||Murray Cole||Introducing Cohort 4|
|4-Oct-17||Chris Vasiladiotis||Chris Cummins|
|11-Oct-17||Nicolai Oswald||–||No Speaker. Meet to chat.|
|18-Oct-17||Margus Lind||Hugh Leather||Introducing Compucast|
|25-Oct-17||Margus Lind||Martin Rüfenacht||Playing with the Bandwidth of Recursive Multiplying|
|1-Nov-17||Maxi Behnke||Arpit Joshi||Architectural Support for Persistent Memory
Emerging non-volatile memory technologies (like 3DXpoint) enable fast, fine-grained persistence compared to slow block-based devices (like disks). However, ensuring consistency of data structures in non-volatile (persistent) memory is a challenge. Ordering and atomic durability are two primitives that can be used to ensure that updates to persistent memory happen in a consistent manner. In this talk, we will see that current support for ordering using persist barriers and atomic durability using software logging add cache line flushes to the critical path. As a solution to this problem, we first propose an efficient persist barrier that reduces the number of cache line flushes happening in the critical path. Then we present ATOM, a hardware log manager based on undo logging that performs the logging operation out of the critical path.
|8-Nov-17||Margus Lind||Bjoern Franke||Generalized Profile-Guided Iterator Recognition|
|15-Nov-17||n/a||n/a||No lunch (PG Open day)|
|22-Nov-17||Brian Coyle||No speaker||PPar social lunch.|
|29-Nov-17||Brian Coyle||Artemiy Margaritov||Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores
In a drive to maximize resource utilization, today’s datacenters are moving to colocation of latency-sensitive and batch workloads on the same server. State-of-the-art deployments colocate such diverse workloads even on a single SMT core. This form of aggressive colocation is afforded by virtue of the fact that a latency-sensitive service operating below its peak load has significant performance slack in its response latency with respect to the QoS target. In my talk, I am going to show that many batch applications can greatly benefit from a large instruction window to uncover ILP and MLP. After that, I am planning to talk about the fact that the performance slack inherent in latency-sensitive workloads operating at low to moderate load makes it safe to shift microarchitectural resources to a co-running batch thread without compromising QoS targets. Lastly, I will introduce Stretch, a simple ROB partitioning scheme that is invoked by system software to provide one hardware thread with a much larger ROB partition at the expense of another thread. When Stretch is enabled for latency-sensitive workloads operating below their peak load on an SMT core, co-running batch applications gain 13% of performance on average (30% max) over a baseline SMT colocation and without compromising QoS constraints.
|06-Dec-17||Karen & Volunteers||PPar Lunch Deluxe||End of semester social.|
|Date||Student Organizer||Speaker||Title / Abstract|
|17-Jan-18||Mattia Bradascio||No speaker||PPar social lunch.|
|24-Jan-18||Mattia Bradascio||Philip Ginsbach||Formalising Computational Idioms for Compilers
Different communities in informatics have identified computational idioms as an important concept to better understand and exploit parallelism in software. The Berkeley Parallel Dwarfs classify scientific workloads into categories, Algorithmic Skeletons allow reasoning about parallelism from a software engineering perspective and higher order functions in functional languages help reveal the compositionality of algorithms. This work however has so far not had much impact on compilers.
With the Idiom Description Language (IDL), we are able to formalize a concept of computational idioms for the use in mainstream compilers to allow idiom specific optimization and parallelisation of C/C++ programs. Using domain specific knowledge, this allows us exploit parallel and heterogeneous hardware for programs that are beyond the scope of established static analysis methods.
|31-Jan-18||Mattia Bradascio||Floyd Chitalu||Real-time CPU-GPU streaming of lightfield video
Lightfield (volumetric) video, as a high-dimensional function, is very demand- in terms of storage. As such, lightfield video data, even in a compressed form, do not typically fit in GPU or main memory unless the capture area, resolution or duration is sufficiently small. Additionally, latency minimization–critical for viewer comfort in use-cases such as virtual reality–places further constraints in many schemes. In this talk, I’ll present a method we developed at Disney Research for streaming lightfield video, parameterized on viewer location time, that efficiently handles RAM-to-GPU memory transfers lightfield video in a compressed form. I’ll also briefly share my experience of doing an internship.
|7-Feb-18||Pablo Andres-Martinez||Justs Zarins||Progressive load balancing of asynchronous algorithms
Synchronisation in the presence of noise and hardware performance variability is a key challenge that prevents applications from scaling to large problems and machines. Using asynchronous or semi-synchronous algorithms can help overcome this issue, but at the cost of reduced stability or convergence rate. In this paper we propose progressive load balancing to manage progress imbalance in asynchronous algorithms dynamically. In our technique the balancing is done over time, not instantaneously.
Using Jacobi iterations as a test case, we show that, with CPU performance variability present, this approach leads to higher iteration rate and lower progress imbalance between parts of the solution space. We also show that under these conditions the balanced asynchronous method outperforms synchronous, semi-synchronous and totally asynchronous implementations in terms of time to solution.
* Not to be confused with WebAssembly, an emerging alternative compilation target for the Web.
|21-Feb-18||Pablo Andres-Martinez||Simon Fowler||Unlocking Functional Web Programming
The Elm programming language addresses these problems neatly: a functional model describes the state of the page, and a rendering function displays the model as HTML. Each component on the page produces messages, which update the model and therefore the rendered HTML.
In the first part of this talk, I will describe the Elm architecture, and our work porting the Elm architecture to the Links programming language developed at Edinburgh.
In the second part of the talk, I will give a quick introduction to hobbyist lockpicking.
|28-Feb-18||Aleksandr Maramzin||Paul Piho|
|7-Mar-18||Aleksandr Maramzin||Amna Shahab|
|14-Mar-18||Aleksandr Maramzin||Rajkarn Singh|
|21-Mar-18||Nicolai Oswald||Larisa Stoltzfus|
|28-Mar-18||Nicolai Oswald||Vanya Yaneva|
|4-Apr-18||Martin Kristien||Jakub Zaleweski|
|25-Apr-18||Bruce Collie||Rodrigo Caetano de Oliveira Rocha|
|2-May-18||Bruce Collie||Dan Mills|
|9-May-18||Martin Kristien||Lewis Crawford|
|16-May-18||Maxi Behnke||Vasilis Gavrielatos|
|23-May-18||Jack Turner||Rudi Horn|
|30-May-18||Jack Turner||Viktor Ivanov|
|06-Jun-18||Karen & Volunteers||PPar Lunch Deluxe||End of semester lunch series wrap-up/social.|