Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An evaluation of memory consistency models for shared-memory systems with ILP processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Speculative lock elision: enabling highly concurrent multithreaded execution
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Speculative synchronization: applying thread-level speculation to explicitly parallel applications
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Speculative Sequential Consistency with Little Custom Storage
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Memory Ordering: A Value-Based Approach
Proceedings of the 31st annual international symposium on Computer architecture
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
Programming with transactional coherence and consistency (TCC)
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Toward kilo-instruction processors
ACM Transactions on Architecture and Code Optimization (TACO)
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization
Proceedings of the 32nd annual international symposium on Computer Architecture
Store Memory-Level Parallelism Optimizations for Commercial Applications
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 33rd annual international symposium on Computer Architecture
Bulk Disambiguation of Speculative Threads in Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Issues in the design of store buffers in dynamically scheduled processors
ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Hardware atomicity for reliable software speculation
Proceedings of the 34th annual international symposium on Computer architecture
Mechanisms for store-wait-free multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency
Proceedings of the 34th annual international symposium on Computer architecture
A Scalable, Non-blocking Approach to Transactional Memory
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
POWER4 system microarchitecture
IBM Journal of Research and Development
On the effectiveness of speculative and selective memory fences
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Implicit transactional memory in kilo-instruction multiprocessors
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A real system evaluation of hardware atomicity for software speculation
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Memory models: a case for rethinking parallel languages and hardware
Communications of the ACM
DRFX: a simple and efficient memory model for concurrent programming languages
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
RETCON: transactional repair without replay
Proceedings of the 37th annual international symposium on Computer architecture
Efficient sequential consistency using conditional fences
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Efficient processor support for DRFx, a memory model with exceptions
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A case for an SC-preserving compiler
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Automatic inference of memory fences
Proceedings of the 2010 Conference on Formal Methods in Computer-Aided Design
Proceedings of the 38th annual international symposium on Computer architecture
Efficient sequential consistency via conflict ordering
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
End-to-end sequential consistency
Proceedings of the 39th Annual International Symposium on Computer Architecture
BlockChop: dynamic squash elimination for hybrid processor architecture
Proceedings of the 39th Annual International Symposium on Computer Architecture
TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Volition: scalable and precise sequential consistency violation detection
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Fast RMWs for TSO: semantics and implementation
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 27th international ACM conference on International conference on supercomputing
WeeFence: toward making fences free in TSO
Proceedings of the 40th Annual International Symposium on Computer Architecture
BulkCommit: scalable and fast commit of atomic blocks in a lazy multiprocessor environment
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Fence-free work stealing on bounded TSO processors
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.02 |
A multiprocessor's memory consistency model imposes ordering constraints among loads, stores, atomic operations, and memory fences. Even for consistency models that relax ordering among loads and stores, ordering constraints still induce significant performance penalties due to atomic operations and memory ordering fences. Several prior proposals reduce the performance penalty of strongly ordered models using post-retirement speculation, but these designs either (1) maintain speculative state at a per-store granularity, causing storage requirements to grow proportionally to speculation depth, or (2) employ distributed global commit arbitration using unconventional chunk-based invalidation mechanisms. In this paper we propose InvisiFence, an approach for implementing memory ordering based on post-retirement speculation that avoids these concerns. InvisiFence leverages minimalistic mechanisms for post-retirement speculation proposed in other contexts to (1) track speculative state efficiently at block-granularity with dedicated storage requirements independent of speculation depth, (2) provide fast commit by avoiding explicit commit arbitration, and (3) operate under a conventional invalidation-based cache coherence protocol. InvisiFence supports both modes of operation found in prior work: speculating only when necessary to minimize the risk of rollback-inducing violations or speculating continuously to decouple consistency enforcement from the processor core. Overall, InvisiFence requires approximately one kilobyte of additional state to transform a conventional multiprocessor into one that provides performance-transparent memory ordering, fences, and atomic operations.