MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Mul-T: a high-performance parallel Lisp
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
The SPARC architecture manual (version 9)
The SPARC architecture manual (version 9)
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
A new solution of Dijkstra's concurrent programming problem
Communications of the ACM
Solution of a problem in concurrent programming control
Communications of the ACM
Lock reservation: Java locks can mostly do without atomic operations
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient sequential consistency using conditional fences
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SALSA: scalable and low synchronization NUMA-aware algorithm for producer-consumer pools
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Fast RMWs for TSO: semantics and implementation
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Traditional memory fences are program-counter (PC) based. That is, a memory fence enforces a serialization point in the program instruction stream --- it ensures that all memory references before the fence in the program order have taken effect before the execution continues onto instructions after the fence. Such PC-based memory fences always cause the processor to stall, even when the synchronization is unnecessary during a particular execution. We propose the concept of location-based memory fences, which aim to reduce the cost of synchronization due to the latency of memory fence execution in parallel algorithms. Unlike a PC-based memory fence, a location-based memory fence serializes the instruction stream of the executing thread T1 only when a different thread T2 attempts to read the memory location which is guarded by the location-based memory fence. In this work, we describe a hardware mechanism for location-based memory fences, prove its correctness, and evaluate its potential performance benefit. Our experimental results are based on a software simulation of the proposed location-based memory fence, and thus expected to incur higher overhead than the proposed hardware mechanism would. Nevertheless, our software experiments show that applications can benefit from using location-based memory fences, but they do not scale as well in some cases, due to the software overhead. These results suggest that a hardware support for location-based memory fences is worth considering.