Transparently consistent asynchronous shared memory

Authors:
Hakan Akkan;Latchesar Ionkov;Michael Lang
Affiliations:
New Mexico Consortium;Los Alamos National Laboratory;Los Alamos National Laboratory
Venue:
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Year:
2013

Citing 7
Cited 0

Munin: distributed shared memory based on type-specific memory coherence

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Using a "codelet" program execution model for exascale machines: position paper

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
TIDeFlow: The Time Iterated Dependency Flow Execution Model

DFM '11 Proceedings of the 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing
Optimizing latency and throughput for spawning processes on massively multicore processors

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Stepping towards noiseless Linux environment

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Dune: safe user-level access to privileged CPU features

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The advent of many-core processors is imposing many changes on the operating system. The resources that are under contention have changed; previously, CPU cycles were the resource in demand and required fair and precise sharing. Now compute cycles are plentiful, but the memory per core is decreasing. In the past, scientific applications used all the CPU cores to finish as fast as possible, with visualization and analysis of the data performed after the simulation finished. With decreasing memory available per core, as well as the higher price (in power and time) for storing data on disk or sending it over the network, it now makes sense to run visualization and analytics applications in-situ, while the application is running. Visualization and analytics applications then need to sample the simulation memory with as little interference and as little changes in the simulation code as possible. We propose an asynchronous memory sharing facility that allows consistent states of the memory to be shared between processes without any implicit or explicit synchronization. We distinguish two types of processes; a single producer and one or more observers. The producer modifies the state of the data, making available consistent versions of the state to any observer. The observers, working at different sampling rates, can access the latest available consistent state. Some applications that would benefit from this type of facility include check-pointing applications, processes monitoring, unobtrusive process debugging, and the sharing of data for visualization or analytics. To evaluate our ideas we have developed two kernel-level implementations for sharing data asynchronously and we compared these implementations to a traditional user-space synchronized multi-buffer method. We have seen improvements of up to 3.5x in our tests over the traditional multi-buffer method with 20% of the data pages touched.