Correct memory operation of cache-based multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Efficient and correct execution of parallel programs that share memory
ACM Transactions on Programming Languages and Systems (TOPLAS)
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Weak ordering—a new definition
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Hiding Relaxed Memory Consistency with a Compiler
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Speculative Sequential Consistency with Little Custom Storage
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Automatic fence insertion for shared memory multiprocessing
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
Mechanisms for store-wait-free multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency
Proceedings of the 34th annual international symposium on Computer architecture
Goldilocks: a race and transaction-aware java runtime
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
A Scalable, Non-blocking Approach to Transactional Memory
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Race directed random testing of concurrent programs
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
InvisiFence: performance-transparent memory ordering in conventional multiprocessors
Proceedings of the 36th annual international symposium on Computer architecture
x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors
Communications of the ACM
Memory models: a case for rethinking parallel languages and hardware
Communications of the ACM
DRFX: a simple and efficient memory model for concurrent programming languages
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 37th annual international symposium on Computer architecture
Efficient sequential consistency using conditional fences
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient processor support for DRFx, a memory model with exceptions
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A case for an SC-preserving compiler
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Implicit transactional memory in kilo-instruction multiprocessors
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
End-to-end sequential consistency
Proceedings of the 39th Annual International Symposium on Computer Architecture
Volition: scalable and precise sequential consistency violation detection
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Fast RMWs for TSO: semantics and implementation
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 27th international ACM conference on International conference on supercomputing
WeeFence: toward making fences free in TSO
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
Although the sequential consistency (SC) model is the most intuitive, processor designers often choose to support relaxed memory consistency models for higher performance. This is because SC implementations that match the performance of relaxed memory models require post-retirement speculation and its associated hardware costs. In this paper we propose an efficient approach for enforcing SC without requiring post-retirement speculation. While prior SC implementations guarantee SC by explicitly completing memory operations within a processor in program order, we guarantee SC by completing conflicting memory operations, within and across processors, in an order that is consistent with the program order. More specifically, we identify those conflicting memory operations whose ordering is critical for the maintenance of SC and explicitly order them. This allows us to safely (non-speculatively) complete memory operations past pending writes, thus reducing memory ordering stalls. Our experiments with SPLASH-2 programs show that SC can be achieved efficiently, with performance comparable to RMO (relaxed memory order).