Memory consistency and event ordering in scalable shared-memory multiprocessors

Authors:
Kourosh Gharachorloo;Daniel Lenoski;James Laudon;Phillip Gibbons;Anoop Gupta;John Hennessy
Affiliations:
Computer Systems Laboratory, Stanford University, CA;Computer Systems Laboratory, Stanford University, CA;Computer Systems Laboratory, Stanford University, CA;Computer Systems Laboratory, Stanford University, CA;Computer Systems Laboratory, Stanford University, CA;Computer Systems Laboratory, Stanford University, CA
Venue:
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Year:
1990

Citing 6
Cited 302

Memory access buffering in multiprocessors

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Correct memory operation of cache-based multiprocessors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Efficient and correct execution of parallel programs that share memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel implementation of OPS5 on the encore multiprocessor: results and analysis

International Journal of Parallel Programming
Access ordering and coherence in shared memory multiprocessors

Access ordering and coherence in shared memory multiprocessors
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture

Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Efficient Doacross execution on distributed shared-memory multiprocessors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
LimitLESS directories: A scalable cache coherence scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance evaluation of memory consistency models for shared-memory multiprocessors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Synchronization without contention

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Proving sequential consistency of high-performance shared memories (extended abstract)

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Detecting violations of sequential consistency

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Race-free interconnection networks and multiprocessor consistency

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Detecting data races on weak memory systems

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Debuggable concurrency extensions for standard ML

PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
A correctness condition for high-performance multiprocessors (extended abstract)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
A performance study of memory consistency models

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Parallel Programming Using Shared Objects and Broadcasting

Computer - Special issue on sharing: high performance at low cost
Specifying non-blocking shared memories (extended abstract)

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Distributed shared memory with versioned objects

OOPSLA '92 conference proceedings on Object-oriented programming systems, languages, and applications
Willow: a scalable shared memory multiprocessor

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Cache consistency in hierarchical-ring-based multiprocessors

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Lazy caching

ACM Transactions on Programming Languages and Systems (TOPLAS)
Integrating message-passing and shared-memory: early experience

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A concurrent copying garbage collector for languages that distinguish (im)mutable data

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The shared regions approach to software cache coherence on multiprocessors

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Causal controversy at Le Mont St.-Michel

ACM SIGOPS Operating Systems Review
Distribution in a single address space operating system

ACM SIGOPS Operating Systems Review
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons

ACM Computing Surveys (CSUR)
Memory consistency models

ACM SIGOPS Operating Systems Review
Implementing hybrid consistency with high-level synchronization operations

PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Evaluation of release consistent software distributed shared memory on emerging network technology

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Shared memory consistency conditions for non-sequential execution: definitions and programming strategies

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
The power of processor consistency

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Anatomy of a message in the Alewife multiprocessor

ICS '93 Proceedings of the 7th international conference on Supercomputing
Dynamic switching of coherent cache protocols and its effects on Doacross loops

ICS '93 Proceedings of the 7th international conference on Supercomputing
A bibliography of parallel debuggers, 1993 edition

PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
A cache coherence scheme suitable for massively parallel processors

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A distributed shared memory multiprocessor ASURA: memory and cache architecture

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Sequential consistency versus linearizability

ACM Transactions on Computer Systems (TOCS)
On testing cache-coherent shared memories

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Performance evaluation of hybrid hardware and software distributed shared memory protocols

ICS '94 Proceedings of the 8th international conference on Supercomputing
Software versus hardware shared-memory implementation: a case study

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Simple compiler algorithms to reduce ownership overhead in cache coherence protocols

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Interleaving: a multithreading technique targeting multiprocessors and workstations

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reconciliations

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Making operations of concurrent data types fast

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Mixed consistency: a model for parallel programming (extended abstract)

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
A checkpoint protocol for an entry consistent shared memory system

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Optimizing parallel programs with explicit synchronization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Software caching and computation migration in Olden

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Techniques for reducing consistency-related communication in distributed shared-memory systems

ACM Transactions on Computer Systems (TOCS)
An executable specification, analyzer and verifier for RMO (relaxed memory order)

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Problem-oriented object memory: customizing consistency

Proceedings of the tenth annual conference on Object-oriented programming systems, languages, and applications
A comprehensive bibliography of distributed shared memory

ACM SIGOPS Operating Systems Review
An analytic study of dynamic hardware and software cache coherence strategies

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance of cache coherence in stackable filing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
CRL: high-performance all-software distributed shared memory

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Message passing versus distributed shared memory on networks of workstations

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Lazy release consistency for hardware-coherent multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Architectural mechanisms for explicit communication in shared memory multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Techniques for reducing overheads of shared-memory multiprocessing

ICS '95 Proceedings of the 9th international conference on Supercomputing
A compiler-directed distributed shared memory system

ICS '95 Proceedings of the 9th international conference on Supercomputing
Data forwarding in scalable shared-memory multiprocessors

ICS '95 Proceedings of the 9th international conference on Supercomputing
A proposal of self-cleanup cache

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Efficient strategies for software-only protocols in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Distributed Shared Abstractions (DSA) on Multiprocessors

IEEE Transactions on Software Engineering
Understanding application performance on shared virtual memory systems

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Using dataflow analysis techniques to reduce ownership overhead in cache coherence protocols

ACM Transactions on Programming Languages and Systems (TOPLAS)
An evaluation of memory consistency models for shared-memory systems with ILP processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An integrated compile-time/run-time software distributed shared memory system

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
SoftFLASH: analyzing the performance of clustered distributed virtual shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Scope consistency: a bridge between release consistency and entry consistency

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
An analysis of dag-consistent distributed shared-memory algorithms

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
A combined-consistency approach: sequential amp; causal-consistency

ACM SIGOPS Operating Systems Review
An Architecture for Tolerating Processor Failures in Shared-Memory Multiprocessors

IEEE Transactions on Computers
Data Forwarding in Scalable Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Verification techniques for cache coherence protocols

ACM Computing Surveys (CSUR)
Group consistency model which separates the intra-group consistency maintenance from the inter-group consistency maintenance in large scale DSM systems

ACM SIGOPS Operating Systems Review
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Temporal notions of synchronization and consistency in Beehive

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Performance debugging shared memory parallel programs using run-time dependence analysis

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Transactional client-server cache consistency: alternatives and performance

ACM Transactions on Database Systems (TODS)
How to Make a Correct Multiprocess Program Execute Correctly on a Multiprocessor

IEEE Transactions on Computers
Distributed shared memory systems with improved barrier synchronization and data transfer

ICS '97 Proceedings of the 11th international conference on Supercomputing
Compiler and software distributed shared memory support for irregular applications

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Tradeoffs between false sharing and aggregation in software distributed shared memory

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Relaxed consistency and coherence granularity in DSM systems: a performance evaluation

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
The interaction of software prefetching with ILP processors in shared-memory systems

Proceedings of the 24th annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB

Proceedings of the 24th annual international symposium on Computer architecture
A Survey of Recoverable Distributed Shared Virtual Memory Systems

IEEE Transactions on Parallel and Distributed Systems
Towards transparent and efficient software distributed shared memory

Proceedings of the sixteenth ACM symposium on Operating systems principles
Computing global virtual time in shared-memory multiprocessors

ACM Transactions on Modeling and Computer Simulation (TOMACS)
An interaction of coherence protocols and memory consistency models in DSM systems

ACM SIGOPS Operating Systems Review
Tolerating latency in multiprocessors through compiler-inserted prefetching

ACM Transactions on Computer Systems (TOCS)
Implementing sequentially consistent shared objects using broadcast and point-to-point communication

Journal of the ACM (JACM)
Per-Node Multithreading and Remote Latency

IEEE Transactions on Computers
Lamport clocks: verifying a directory cache-coherence protocol

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Computation-centric memory models

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
OPTNET: a cost-effective optical network for multiprocessors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Adapting the Network Interface for High-Performance Computing: The CNI Approach

The Journal of Supercomputing - Special issue: high performance distributed computing
Formal verification of complex coherence protocols using symbolic state models

Journal of the ACM (JACM)
Retrospective: memory access buffering in multiprocessors

25 years of the international symposia on Computer architecture (selected papers)
Retrospective: weak ordering—a new definition

25 years of the international symposia on Computer architecture (selected papers)
Hardware Support for Flexible Distributed Shared Memory

IEEE Transactions on Computers
Tapeworm: high-level abstractions of shared accesses

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
An Executable Specification and Verifier for Relaxed Memory Order

IEEE Transactions on Computers - Special issue on cache memory and related problems
Commit-reconcile & fences (CRF): a new memory model for architects and compiler writers

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
CACHET: an adaptive cache coherence protocol for distributed shared-memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Shared virtual memory with automatic update support

ICS '99 Proceedings of the 13th international conference on Supercomputing
The scalability of multigrain systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
A system-level specification framework for I/O architectures

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
PSCR: A Coherence Protocol for Eliminating Passive Sharing in Shared-Bus Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Teapot: A Domain-Specific Language for Writing Cache Coherence Protocols

IEEE Transactions on Software Engineering
Scalable Consistency Protocols for Distributed Services

IEEE Transactions on Parallel and Distributed Systems
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A decentralized communication efficient distributed shared memory

SAC '96 Proceedings of the 1996 ACM symposium on Applied Computing
A high-level abstraction of shared accesses

ACM Transactions on Computer Systems (TOCS)
Hardware spatial forwarding for widely shared data

Proceedings of the 14th international conference on Supercomputing
Implementing a caching service a distributed COBRA objects

IFIP/ACM International Conference on Distributed systems platforms
An asynchronous protocol for release consistent distributed shared memory systems

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Formal Automatic Verification of Cache Coherence in Multiprocessors with Relaxed Memory Models

IEEE Transactions on Parallel and Distributed Systems
Location Consistency-A New Memory Model and Cache Consistency Protocol

IEEE Transactions on Computers
A Protocol-Centric Approach to on-the-Fly Race Detection

IEEE Transactions on Parallel and Distributed Systems
Java consistency: nonoperational characterizations for Java memory behavior

ACM Transactions on Computer Systems (TOCS)
Scalable fault-tolerant distributed shared memory

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Core semantics of multithreaded Java

Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
Runtime optimizations for a Java DSM implementation

Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
View-based consistency and false sharing effect in distributed shared memory

ACM SIGOPS Operating Systems Review
Modeling weakly consistent memories with locks

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Contention elimination by replication of sequential sections in distributed shared memory programs

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Causal memory meets the consistency and performance needs of distributed applications!

EW 6 Proceedings of the 6th workshop on ACM SIGOPS European workshop: Matching operating systems to application needs
Distributed shared memory: experience with Munin

EW 5 Proceedings of the 5th workshop on ACM SIGOPS European workshop: Models and paradigms for distributed systems structuring
Distribution in a single address space operating system

EW 5 Proceedings of the 5th workshop on ACM SIGOPS European workshop: Models and paradigms for distributed systems structuring
Hiding Relaxed Memory Consistency with a Compiler

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Lazy Garbage Collection of Recovery State for Fault-Tolerant Distributed Shared Memory

IEEE Transactions on Parallel and Distributed Systems
Design and Performance Analysis of a Distributed Java Virtual Machine

IEEE Transactions on Parallel and Distributed Systems
Two-handed emulation: how to build non-blocking implementations of complex data-structures using DCAS

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Removing the overhead from software-based shared memory

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Shared State for Distributed Interactive Data Mining Applications

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Run-time support for distributed sharing in safe languages

ACM Transactions on Computer Systems (TOCS)
An Application-Driven Study of Multicast Communication for Write Invalidation

The Journal of Supercomputing
Y-Invalidate: A New Protocol for Implementing Weak Consistency in DSM Systems

International Journal of Parallel Programming
Selecting threads for workload migration in software distributed shared memory systems

Parallel Computing
Distributed Shared Memory: Concepts and Systems

IEEE Parallel & Distributed Technology: Systems & Technology
Multiprocessor Validation of the Pentium Pro

Computer
Shared Memory Consistency Models: A Tutorial

Computer
Storage in the PowerPC

IEEE Micro
The MAJC Architecture: A Synthesis of Parallelism and Scalability

IEEE Micro
Accuracy of Memory Reference Traces of Parallel Computations in Trace-Drive Simulation

IEEE Transactions on Parallel and Distributed Systems
Evaluation of NUMA Memory Management Through Modeling and Measurements

IEEE Transactions on Parallel and Distributed Systems
The DASH Prototype: Logic Overhead and Performance

IEEE Transactions on Parallel and Distributed Systems
A Unified Formalization of Four Shared-Memory Models

IEEE Transactions on Parallel and Distributed Systems
Access Graphs: A Model for Investigating Memory Consistency

IEEE Transactions on Parallel and Distributed Systems
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Four Memory Consistency Models for Multithreaded Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Lazy Garbage Collection of Recovery State for Fault-Tolerant Distributed Shared Memory

IEEE Transactions on Parallel and Distributed Systems
The Working-Set Based Adaptive Protocol for Software Distributed Shared Memory

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
The Affinity Entry Consistency Protocol

ICPP '97 Proceedings of the international Conference on Parallel Processing
The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Dag-Consistent Distributed Shared Memory

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Relative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Aurora: Scoped Behavior for Per-Context Optimized Distributed Data Sharing

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Enhancing Software DSM for Compiler-Parallelized Applications

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Study of the Efficiency of Shared Attraction Memories in Cluster-Based COMA Multiprocessors

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Speculative Sequential Consistency with Little Custom Storage

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Specification and Verification Framework for Developing Weak Shared Memory Consistency Protocols

FMCAD '02 Proceedings of the 4th International Conference on Formal Methods in Computer-Aided Design
How Can We Design Better Networks for DSM Systems?

PCRCW '97 Proceedings of the Second International Workshop on Parallel Computer Routing and Communication
The Impact of Cache Coherence Protocols on Systems using Fine-Grain Data Synchronization

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
CableS: Thread Control and Memory System Extensions for Shared Virtual Memory Clusters

WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Improving the Performance of Heterogeneous DSMs via Multithreading

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
InterAct: Virtual Sharing for Interactive Client-Server Applications

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
InterWeave: A Middleware System for Distributed Shared State

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Run-Time Support for Distributed Sharing in Typed Languages

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
The Efeect of Contention on the Scalability of Page-Based Software Shared Memory Systems

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
View consistencies and exact implementations

Parallel Computing
Multiple-writer entry consistency

Cluster computing
An efficient causal logging scheme for recoverable distributed shared memory systems

Parallel Computing
Information-Flow Models for Shared Memory with an Application to the PowerPC Architecture

IEEE Transactions on Parallel and Distributed Systems
Inferential queueing and speculative push for reducing critical communication latencies

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Automatic fence insertion for shared memory multiprocessing

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Executing Java threads in parallel in a distributed-memory environment

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Multiprocessor validation of the Pentium Pro microprocessor

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Lock improvement technique for release consistency in distributed shared memory systems

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Hardware Controlled Prefeching in Directory-Based Cache Coherent Systems

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Improving Release-Consistent Shared Virtual Memory using Automatic Update

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Comparison of Entry Consistency and Lazy Release Consistency Implementations

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Relaxing Cache Coherence Protocol with QOLB Synchronizations

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
CNI: A High-Performance Network Interface for Workstation Clusters

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Dynamically Controlling False Sharing in Distributed Shared Memory

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
A Performance Debugger for Eliminating Excess Synchronization in Shared-Memory Parallel Programs

MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Cohesion: an efficient distributed shared memory system supporting multiple memory consistency models

PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Lazy TLB Consistency for Large-Scale Multiprocessors

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Logging and Recovery in Adaptive Software Distributed Shared Memory Systems

SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
The Thread-Based Protocol Engines for CC-NUMA Multiprocessors

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Sparks: coherence as an abstract type

IWOOOS '96 Proceedings of the 5th International Workshop on Object Orientation in Operating Systems (IWOOOS '96)
Locality and Performance of Page- and Object-Based DSMs

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
An Efficient Logging Scheme for Lazy Release Consistent Distributed Shared Memory Systems

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Characterizations for Java Memory Behavior

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Checkpointing and Recovery for Distributed Shared Memory Applications

IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
A Framework of Customizing Transactions in Persistent Object Management for Advanced Applications

IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
References

Sourcebook of parallel computing
Consistency and event ordering in the shared regions model

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Hundreds of impossibility results for distributed computing

Distributed Computing - Papers in celebration of the 20th anniversary of PODC
Consistency models for Internet caching

WISICT '04 Proceedings of the winter international synposium on Information and communication technologies
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
CAS-DSM: a compiler assisted software distributed shared memory

International Journal of Parallel Programming
A unified theory of shared memory consistency

Journal of the ACM (JACM)
Coherence decoupling: making use of incoherence

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
A page-coherent, causally consistent protocol for distributed shared memory

Journal of Systems and Software
A common framework for inter-process communication on a cluster

ACM SIGOPS Operating Systems Review
A comparative evaluation of hardware-only and software-only directory protocols in shared-memory multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
The Java memory model

Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Speculative Incoherent Cache Protocols

IEEE Micro
Seven-O'Clock: A New Distributed GVT Algorithm Using Network Atomic Operations

Proceedings of the 19th Workshop on Principles of Advanced and Distributed Simulation
Implementing hybrid consistency with high-level synchronization operations

Distributed Computing
Store Memory-Level Parallelism Optimizations for Commercial Applications

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Making Sequential Consistency Practical in Titanium

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
On the correctness of program execution when cache coherence is maintained locally at data-sharing boundaries in distributed shared memory multiprocessors

International Journal of Parallel Programming
Inferential queueing and speculative push

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Memory Model = Instruction Reordering + Store Atomicity

Proceedings of the 33rd annual international symposium on Computer Architecture
Conditional Memory Ordering

Proceedings of the 33rd annual international symposium on Computer Architecture
Addressing a workload characterization study to the design of consistency protocols

The Journal of Supercomputing
What do high-level memory models mean for transactions?

Proceedings of the 2006 workshop on Memory system performance and correctness
Lightweight lock-free synchronization methods for multithreading

Proceedings of the 20th annual international conference on Supercomputing
CycleMeter: detecting fraudulent peers in internet cycle sharing

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Integrating coordinated checkpointing and recovery mechanisms into DSM synchronization barriers

Journal of Experimental Algorithmics (JEA)
Parallel strategies for the local biological sequence alignment in a cluster of workstations

Journal of Parallel and Distributed Computing
A theory of memory models

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Using fine grain multithreading for energy efficient computing

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
Message-driven relaxed consistency in a software distributed shared memory

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Software write detection for a distributed shared memory

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Integrating coherency and recoverability in distributed systems

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Brazos: a third generation DSM system

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Experience with a language for writing coherence protocols

DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
Implementing optimized distributed data sharing using scoped behaviour and a class library

COOTS'97 Proceedings of the 3rd conference on USENIX Conference on Object-Oriented Technologies (COOTS) - Volume 3
Store Atomicity for Transactional Memory

Electronic Notes in Theoretical Computer Science (ENTCS)
Efficient user-level thread migration and checkpointing on windows NT clusters

WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Embedded real-time architecture for level-set-based active contours

EURASIP Journal on Applied Signal Processing
Incrementally parallelizing database transactions with thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Implementing sequentially consistent programs on processor consistent platforms

Journal of Parallel and Distributed Computing
Foundations of the C++ concurrency memory model

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Atom-Aid: Detecting and Surviving Atomicity Violations

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Modularity: a first class concept to address distributed systems

ACM SIGACT News
The Verification of the On-Chip COMA Cache Coherence Protocol

AMAST 2008 Proceedings of the 12th international conference on Algebraic Methodology and Software Technology
COMIC: a coherent shared memory interface for cell be

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
CoMPSoC: A template for composable and predictable multi-processor system on chips

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Relaxed memory models: an operational approach

Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Interconnection of distributed memory models

Journal of Parallel and Distributed Computing
Identifying Inter-task Communication in Shared Memory Programming Models

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Data caching, garbage collection, and the Java memory model

Proceedings of the 7th International Workshop on Java Technologies for Real-Time and Embedded Systems
An on-chip interconnect and protocol stack for multiple communication paradigms and programming models

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A tuneable software cache coherence protocol for heterogeneous MPSoCs

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Engineering Distributed Shared Memory Middleware for Java

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part I
Race-free and memory-safe multithreading: design and implementation in cyclone

Proceedings of the 5th ACM SIGPLAN workshop on Types in language design and implementation
Implicit and explicit transactions in a distributed transactional memory system

PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
A unified formal specification and analysis of the new java memory models

ASM'03 Proceedings of the abstract state machines 10th international conference on Advances in theory and practice
Scalability of relaxed consistency models in NoC based multicore architectures

ACM SIGARCH Computer Architecture News
Parallel DNA sequence alignment using a DSM system in a cluster of workstations

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
DRFX: a simple and efficient memory model for concurrent programming languages

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Fuzzy group membership

Future directions in distributed computing
Inter-task communication via overlapping read and write windows for deadlock-free execution of cyclic task graphs

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
LReplay: a pending period based deterministic replay scheme

Proceedings of the 37th annual international symposium on Computer architecture
An OpenCL framework for heterogeneous multicores with local memory

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Energy- and Performance-Efficient Communication Framework for Embedded MPSoCs through Application-Driven Release Consistency

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Exploiting locality: a flexible DSM approach

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Monitoring remotely executing shared memory programs in software DSMs

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Efficient system-enforced deterministic parallelism

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
WOMM: a weak operational memory model

ISoLA'10 Proceedings of the 4th international conference on Leveraging applications of formal methods, verification, and validation - Volume Part I
quasi-linearizability: relaxed consistency for improved concurrency

OPODIS'10 Proceedings of the 14th international conference on Principles of distributed systems
RCDC: a relaxed consistency deterministic computer

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Realization and performance comparison of sequential and weak memory consistency models in network-on-chip based multi-core systems

Proceedings of the 16th Asia and South Pacific Design Automation Conference
A minimalist cache coherent MPSoC designed for FPGAs

International Journal of High Performance Systems Architecture
Chip multithreaded consistency model

Journal of Computer Science and Technology
The impact of memory models on software reliability in multiprocessors

Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Deterministic OpenMP for race-free parallelism

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Evaluating the impact of thread escape analysis on a memory consistency model-aware compiler

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
An efficient software shared virtual memory for the single-chip cloud computer

Proceedings of the Second Asia-Pacific Workshop on Systems
View-Oriented parallel programming and view-based consistency

PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
A real time MPEG-4 parallel encoder on software distributed shared memory systems

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
On consistency of encrypted files

DISC'06 Proceedings of the 20th international conference on Distributed Computing
Internally deterministic parallel algorithms can be fast

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Data hiding in compiled program binaries for enhancing computer system performance

IH'05 Proceedings of the 7th international conference on Information Hiding
Integrating coordinated checkpointing and recovery mechanisms into DSM synchronization barriers

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
A theory of speculative computation

ESOP'10 Proceedings of the 19th European conference on Programming Languages and Systems
Verification of the java causality requirements

HVC'05 Proceedings of the First Haifa international conference on Hardware and Software Verification and Testing
Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures

ACM Transactions on Embedded Computing Systems (TECS)
Data-race and concurrent-write freedom are undecidable

Computer Languages, Systems and Structures
Exclusive Access to Resources in Distributed Shared Memory Architecture

Fundamenta Informaticae - Concurrency Specification and Programming (CS&P)
Implicit transactional memory in kilo-instruction multiprocessors

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Domains: safe sharing among actors

Proceedings of the 2nd edition on Programming systems, languages and applications based on actors, agents, and decentralized control abstractions
Conversion: multi-version concurrency control for main memory segments

Proceedings of the 8th ACM European Conference on Computer Systems
Heterogeneous-race-free memory models

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Techniques to improve performance in requester-wins hardware transactional memory

ACM Transactions on Architecture and Code Optimization (TACO)
Static safety guarantees for a low-level multithreaded language with regions

Science of Computer Programming

Quantified Score

Hi-index	0.02

Visualization

Abstract

Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture.This paper introduces a new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, concentrating on issues relevant to scalable architectures.