Efficient and correct execution of parallel programs that share memory
ACM Transactions on Programming Languages and Systems (TOPLAS)
What are race conditions?: Some issues and formalizations
ACM Letters on Programming Languages and Systems (LOPLAS)
Analyses and optimizations for shared address space programs
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
An efficient algorithm for computing MHP information for concurrent Java programs
ESEC/FSE-7 Proceedings of the 7th European software engineering conference held jointly with the 7th ACM SIGSOFT international symposium on Foundations of software engineering
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Static conflict analysis for multi-threaded object-oriented programs
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Automatic fence insertion for shared memory multiprocessing
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Compiler techniques for high performance sequentially consistent java programs
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
We present an algorithm to determine communicator variables in parallel programs. If communicator variables are accessed in program order and accesses to other shared variables are not reordered with respect to communicators, then program executions are sequentially consistent. Computing communicators is an efficient and effective alternative to delay set computation. The algorithm does not require a thread and whole-program control-flow model and tolerates the typical approximations that static program analyses make for threads and data. These properties make the algorithm suitable to handle multi-threaded object-oriented programs with unstructured parallelism. We demonstrate on several multi-threaded Java programs that the algorithm is effective in reducing the number of fences at memory access statements compared to a naive fence insertion algorithm (the reduction is on average 28%) and report the runtime overhead caused by the fences (between 0% and 231%, average 81%).