Efficient and correct execution of parallel programs that share memory
ACM Transactions on Programming Languages and Systems (TOPLAS)
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Analyses and optimizations for shared address space programs
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
How to Make a Correct Multiprocess Program Execute Correctly on a Multiprocessor
IEEE Transactions on Computers
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A study of locking objects with bimodal fields
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Weak ordering—a new definition
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Lock reservation: Java locks can mostly do without atomic operations
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Scientific data repositories: designing for a moving target
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
TO-Lock: Removing Lock Overhead Using the Owners' Temporal Locality
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatically Reducing Repetitive Synchronization with a Just-in-Time Compiler for Java
Proceedings of the international symposium on Code generation and optimization
Compiler techniques for high performance sequentially consistent java programs
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting causal relationships in distributed computations: in search of the holy grail
Distributed Computing
Operating system exploitation of the POWER5 system
IBM Journal of Research and Development - POWER5 and packaging
JavaTM just-in-time compiler and virtual machine improvements for server and middleware applications
VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
Mechanisms for store-wait-free multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
CheckFence: checking consistency of concurrent data types on relaxed memory models
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Reducing Memory Ordering Overheads in Software Transactional Memory
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
InvisiFence: performance-transparent memory ordering in conventional multiprocessors
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 36th annual international symposium on Computer architecture
RCDC: a relaxed consistency deterministic computer
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Automatic inference of memory fences
Proceedings of the 2010 Conference on Formal Methods in Computer-Aided Design
Bounded model checking of concurrent data types on relaxed memory models: a case study
CAV'06 Proceedings of the 18th international conference on Computer Aided Verification
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Conventional relaxed memory ordering techniques follow a proactive model: at a synchronization point, a processor makes its own updates to memory available to other processors by executing a memory barrier instruction, ensuring that recent writes have been ordered with respect to other processors in the system. We show that this model leads to superfluous memory barriers in programs with acquire-release style synchronization, and present a combined hardware/software synchronization mechanism called conditional memory ordering (CMO) that reduces memory ordering overhead. CMO is demonstrated on a lock algorithm that identifies those dynamic lock/unlock operations for which memory ordering is unnecessary, and speculatively omits the associated memory ordering instructions. When ordering is required, this algorithm relies on a hardware mechanism for initiating a memory ordering operation on another processor. Based on evaluation using a software-only CMO prototype, we show that CMO avoids memory ordering operations for the vast majority of dynamic acquire and release operations across a set of multithreaded Java workloads, leading to significant speedups for many. However, performance improvements in the software prototype are hindered by the high cost of remote memory ordering. Using empirical data, we construct an analytical model demonstrating the benefits of a combined hardware-software implementation.