Fast allocation and deallocation of memory based on object lifetimes
Software—Practice & Experience
The shared regions approach to software cache coherence on multiprocessors
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Region-based memory management
Information and Computation
Efficient detection of determinacy races in Cilk programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Region-based memory management in cyclone
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Computer
Safe manual memory management in cyclone
Science of Computer Programming - Special issue on five perspectives on modern memory management: Systems, hardware and theory
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
CellSs: making it easier to program the cell broadband engine processor
IBM Journal of Research and Development
Inferring locks for atomic sections
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Intel threading building blocks
Intel threading building blocks
DMP: deterministic shared memory multiprocessing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Kendo: efficient deterministic multithreading in software
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Safe nondeterminism in a deterministic-by-default parallel language
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
SpiceC: scalable parallelism via implicit copying and explicit commit
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
RCDC: a relaxed consistency deterministic computer
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Analysis of recursively parallel programs
POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The myrmics memory allocator: hierarchical,message-passing allocation for global address spaces
Proceedings of the 2012 international symposium on Memory Management
Deterministic scale-free pipeline parallelism with hyperqueues
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Analysis of Recursively Parallel Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hi-index | 0.00 |
The currently dominant programming models to write software for multicore processors use threads that run over shared memory. However, as the core count increases, cache coherency protocols get very complex and ineffective, and maintaining a shared memory abstraction becomes expensive and impractical. Moreover, writing multithreaded programs is notoriously difficult, as the programmer needs to reason about all the possible thread interleavings and interactions, including the myriad of implicit, non-obvious, and often unpredictable thread interactions through shared memory. Overall, as processors get more cores and parallel software becomes mainstream, the shared memory model reaches its limits regarding ease of programming and efficiency. This position paper presents two ideas aiming to solve the problem. First, we restrict the way the programmer expresses parallelism: The program is a collection of possibly recursive tasks, where each task is atomic and cannot communicate with any other task during its execution. Second, we relax the requirement for coherent shared memory: Each task defines its memory footprint, and is guaranteed to have exclusive access to that memory during its execution. Using this model, we can then define a runtime system that transparently performs the data transfers required among cores without cache coherency, and also produces a deterministic execution of the program, provably equivalent to its sequential elision.