Efficient and correct execution of parallel programs that share memory
ACM Transactions on Programming Languages and Systems (TOPLAS)
Analyses and optimizations for shared address space programs
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
A conservative data flow algorithm for detecting all pairs of statements that may happen in parallel
SIGSOFT '98/FSE-6 Proceedings of the 6th ACM SIGSOFT international symposium on Foundations of software engineering
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Speculative lock elision: enabling highly concurrent multithreaded execution
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Static conflict analysis for multi-threaded object-oriented programs
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Automatic fence insertion for shared memory multiprocessing
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Memory Ordering: A Value-Based Approach
Proceedings of the 31st annual international symposium on Computer architecture
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
Compiler techniques for high performance sequentially consistent java programs
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Making Sequential Consistency Practical in Titanium
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A two-phase escape analysis for parallel java programs
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing
Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Hardware atomicity for reliable software speculation
Proceedings of the 34th annual international symposium on Computer architecture
Mechanisms for store-wait-free multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency
Proceedings of the 34th annual international symposium on Computer architecture
Iterative context bounding for systematic testing of multithreaded programs
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Subtleties of Transactional Memory Atomicity Semantics
IEEE Computer Architecture Letters
The java hotspotTM server compiler
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
TxLinux: using and managing hardware transactional memory in an operating system
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
InvisiFence: performance-transparent memory ordering in conventional multiprocessors
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 36th annual international symposium on Computer architecture
Type systems for distributed data sharing
SAS'03 Proceedings of the 10th international conference on Static analysis
The Bulk Multicore architecture for improved programmability
Communications of the ACM - Finding the Fun in Computer Science Education
DRFX: a simple and efficient memory model for concurrent programming languages
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Efficient sequential consistency using conditional fences
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Efficient processor support for DRFx, a memory model with exceptions
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Data-race exceptions have benefits beyond the memory model
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
A case for an SC-preserving compiler
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 38th annual international symposium on Computer architecture
CoreRacer: a practical memory race recorder for multicore x86 TSO processors
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient and accurate data dependence profiling using software signatures
Proceedings of the Tenth International Symposium on Code Generation and Optimization
End-to-end sequential consistency
Proceedings of the 39th Annual International Symposium on Computer Architecture
BlockChop: dynamic squash elimination for hybrid processor architecture
Proceedings of the 39th Annual International Symposium on Computer Architecture
DeAliaser: alias speculation using atomic region support
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
BulkCommit: scalable and fast commit of atomic blocks in a lazy multiprocessor environment
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
A platform that supported Sequential Consistency (SC) for all codes --- not only the well-synchronized ones --- would simplify the task of programmers. Recently, several hardware architectures that support high-performance SC by committing groups of instructions at a time have been proposed. However, for a platform to support SC, it is insufficient that the hardware does; the compiler has to support SC as well. This paper presents the hardware-compiler interface, and the main compiler ideas for BulkCompiler, a simple compiler layer that works with the group-committing hardware to provide a whole-system high-performance SC platform. We introduce ISA primitives and software algorithms for BulkCompiler to drive instruction-group formation, and to transform code to exploit the groups. Our simulation results show that BulkCompiler not only enables a whole-system SC environment, but also one that actually outperforms a conventional platform that uses the more relaxed Java Memory Model by an average of 37%. The speedups come from code optimization inside software-assembled instruction groups.