Efficient and correct execution of parallel programs that share memory
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing parallel programs with explicit synchronization
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Analyses and optimizations for shared address space programs
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Hiding Relaxed Memory Consistency with a Compiler
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Dependence Analysis in Parallel Loops with i±k Subscripts
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Speculative Sequential Consistency with Little Custom Storage
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Cooperating sequential processes
The origin of concurrent programming
Automatic fence insertion for shared memory multiprocessing
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
Compiler techniques for high performance sequentially consistent java programs
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Making Sequential Consistency Practical in Titanium
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Mechanisms for store-wait-free multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency
Proceedings of the 34th annual international symposium on Computer architecture
Practical escape analyses: how good are they?
Proceedings of the 3rd international conference on Virtual execution environments
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
A Scalable, Non-blocking Approach to Transactional Memory
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
InvisiFence: performance-transparent memory ordering in conventional multiprocessors
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
There is nothing wrong with out-of-thin-air: compiler optimization and memory models
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A case for an SC-preserving compiler
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Efficient sequential consistency via conflict ordering
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
End-to-end sequential consistency
Proceedings of the 39th Annual International Symposium on Computer Architecture
Volition: scalable and precise sequential consistency violation detection
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 27th international ACM conference on International conference on supercomputing
WeeFence: toward making fences free in TSO
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
Among the various memory consistency models, the sequential consistency (SC) model, in which memory operations appear to take place in the order specified by the program, is the most intuitive and enables programmers to reason about their parallel programs the best. Nevertheless, processor designers often choose to support relaxed memory consistency models because the weaker ordering constraints imposed by such models allow for more instructions to be reordered and enable higher performance. Programs running on machines supporting weaker consistency models, can be transformed into ones in which SC is enforced. The compiler does this by computing a minimal set of memory access pairs whose ordering automatically guarantees SC. To ensure that these memory access pairs are not reordered, memory fences are inserted. Unfortunately, insertion of such memory fences can significantly slowdown the program. We observe that the ordering of the minimal set of memory accesses that the compiler strives to enforce, is typically already enforced in the normal course of program execution. A study we conducted on programs with compiler inserted memory fences shows that only 8% of the executed instances of the memory fences are really necessary to ensure SC. Motivated by this study we propose the conditional fence mechanism (C-Fence) that utilizes compiler information to decide dynamically if there is a need to stall at each fence. Our experiments with SPLASH-2 benchmarks show that, with C-Fences, programs can be transformed to enforce SC incurring only 12% slowdown, as opposed to 43% slowdown using normal fence instructions. Our approach requires very little hardware support (