Memory-mapping support for reducer hyperobjects

Authors:
I-Ting Angelina Lee;Aamir Shafi;Charles E. Leiserson
Affiliations:
MIT CSAIL, Cambridge, MA, USA;National University of Sciences and Technology, Islamabad, Pakistan;MIT CSAIL, Cambridge, MA, USA
Venue:
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Year:
2012

Citing 20
Cited 0

What are race conditions?: Some issues and formalizations

ACM Letters on Programming Languages and Systems (LOPLAS)
Sparse matrices in matlab: design and implementation

SIAM Journal on Matrix Analysis and Applications
Authentication in distributed systems: theory and practice

ACM Transactions on Computer Systems (TOCS)
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Software transactional memory

Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Efficient detection of determinacy races in Cilk programs

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A Java fork/join framework

Proceedings of the ACM 2000 conference on Java Grande
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Intel threading building blocks

Intel threading building blocks
Transactional memory with strong atomicity using off-the-shelf memory protection hardware

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducers and other Cilk++ hyperobjects

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
The habanero multicore software research project

Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications
Grace: safe multithreaded programming for C/C++

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
The Cilk++ concurrency platform

The Journal of Supercomputing
Avoiding deadlock avoidance

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Using memory mapping to support cactus stacks in work-stealing runtime systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Dthreads: efficient deterministic multithreading

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Habanero-Java: the new adventures of old X10

Proceedings of the 9th International Conference on Principles and Practice of Programming in Java

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reducer hyperobjects (reducers) provide a linguistic abstraction for dynamic multithreading that allows different branches of a parallel program to maintain coordinated local views of the same nonlocal variable. In this paper, we investigate how thread-local memory mapping (TLMM) can be used to improve the performance of reducers. Existing concurrency platforms that support reducer hyperobjects, such as Intel Cilk Plus and Cilk++, take a hypermap approach in which a hash table is used to map reducer objects to their local views. The overhead of the hash table is costly --- roughly 12x overhead compared to a normal L1-cache memory access on an AMD Opteron 8354. We replaced the Intel Cilk Plus runtime system with our own Cilk-M runtime system which uses TLMM to implement a reducer mechanism that supports a reducer lookup using only two memory accesses and a predictable branch, which is roughly a 3x overhead compared to an ordinary L1-cache memory access. An empirical evaluation shows that the Cilk-M memory-mapping approach is close to 4x faster than the Cilk Plus hypermap approach. Furthermore, the memory-mapping approach admits better locality than the hypermap approach during parallel execution, which allows an application using reducers to scale better.