Munin: distributed shared memory based on type-specific memory coherence
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Using a "codelet" program execution model for exascale machines: position paper
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
TIDeFlow: The Time Iterated Dependency Flow Execution Model
DFM '11 Proceedings of the 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing
Optimizing latency and throughput for spawning processes on massively multicore processors
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Stepping towards noiseless Linux environment
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Dune: safe user-level access to privileged CPU features
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Hi-index | 0.00 |
The advent of many-core processors is imposing many changes on the operating system. The resources that are under contention have changed; previously, CPU cycles were the resource in demand and required fair and precise sharing. Now compute cycles are plentiful, but the memory per core is decreasing. In the past, scientific applications used all the CPU cores to finish as fast as possible, with visualization and analysis of the data performed after the simulation finished. With decreasing memory available per core, as well as the higher price (in power and time) for storing data on disk or sending it over the network, it now makes sense to run visualization and analytics applications in-situ, while the application is running. Visualization and analytics applications then need to sample the simulation memory with as little interference and as little changes in the simulation code as possible. We propose an asynchronous memory sharing facility that allows consistent states of the memory to be shared between processes without any implicit or explicit synchronization. We distinguish two types of processes; a single producer and one or more observers. The producer modifies the state of the data, making available consistent versions of the state to any observer. The observers, working at different sampling rates, can access the latest available consistent state. Some applications that would benefit from this type of facility include check-pointing applications, processes monitoring, unobtrusive process debugging, and the sharing of data for visualization or analytics. To evaluate our ideas we have developed two kernel-level implementations for sharing data asynchronously and we compared these implementations to a traditional user-space synchronized multi-buffer method. We have seen improvements of up to 3.5x in our tests over the traditional multi-buffer method with 20% of the data pages touched.