Memory access buffering in multiprocessors

Authors:
M. Dubois;C. Scheurich;F. Briggs
Affiliations:
Computer Research Institute, University of Southern California, Los Angeles, California;Computer Research Institute, University of Southern California, Los Angeles, California;Dept. of Electrical and Computer Eng., Rice University, Houston, Texas
Venue:
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Year:
1986

Citing 9
Cited 124

Cache Memories

ACM Computing Surveys (CSUR)
Concepts and Notations for Concurrent Programming

ACM Computing Surveys (CSUR)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Supercomputers - Design and Applications

Supercomputers - Design and Applications
The Architecture of Symbolic Computers

The Architecture of Symbolic Computers
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
A communication structure for a multiprocessor computer with distributed global memory

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Effects of buffered memory requests in multiprocessor systems

SIGMETRICS '79 Proceedings of the 1979 ACM SIGMETRICS conference on Simulation, measurement and modeling of computer systems

On cacheability of lock-variables in tightly coupled multiprocessor systems

ACM SIGARCH Computer Architecture News
Correct memory operation of cache-based multiprocessors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Synchronization, Coherence, and Event Ordering in Multiprocessors

Computer
The design of a lockup-free cache for high-performance multiprocessors

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A lazy cache algorithm

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Compiler-Directed Cache Management in Multiprocessors

Computer
Cache coherence for large scale shared memory multiprocessors

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Utilizing virtual shared memory in a topology independent, multicomputer environment

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Stanford distributed-directory protocol

Computer
Efficient Doacross execution on distributed shared-memory multiprocessors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Performance evaluation of memory consistency models for shared-memory multiprocessors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A high-performance, memory-based interconnection system for multicomputer environments

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Architectural primitives for a scalable shared memory multiprocessor

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Proving sequential consistency of high-performance shared memories (extended abstract)

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Sequential consistency versus linearizability (extended abstract)

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Detecting violations of sequential consistency

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Race-free interconnection networks and multiprocessor consistency

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Detecting data races on weak memory systems

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Cache coherence for large scale shared memory multiprocessors

ACM SIGARCH Computer Architecture News - Symposium on parallel algorithms and architectures
A conflict-free memory design for multiprocessors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A correctness condition for high-performance multiprocessors (extended abstract)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Experiences in integrating distributed shared memory with virtual memory management

ACM SIGOPS Operating Systems Review
A performance study of memory consistency models

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Comparative performance evaluation of cache-coherent NUMA and COMA architectures

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cache consistency in hierarchical-ring-based multiprocessors

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
An effective write policy for software coherence schemes

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Lazy caching

ACM Transactions on Programming Languages and Systems (TOPLAS)
Data flow equations for explicitly parallel programs

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Memory consistency models

ACM SIGOPS Operating Systems Review
The process group approach to reliable distributed computing

Communications of the ACM
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The power of processor consistency

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Evaluating the communication performance of MPPs using synthetic sparse matrix multiplication workloads

ICS '93 Proceedings of the 7th international conference on Supercomputing
Dynamic switching of coherent cache protocols and its effects on Doacross loops

ICS '93 Proceedings of the 7th international conference on Supercomputing
The KSR1: experimentation and modeling of poststore

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A distributed shared memory multiprocessor ASURA: memory and cache architecture

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Sequential consistency versus linearizability

ACM Transactions on Computer Systems (TOCS)
An evaluation of a compiler optimization for improving the performance of a coherence directory

ICS '94 Proceedings of the 8th international conference on Supercomputing
Performance evaluation of hybrid hardware and software distributed shared memory protocols

ICS '94 Proceedings of the 8th international conference on Supercomputing
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Mixed consistency: a model for parallel programming (extended abstract)

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
The Potential of Compile-Time Analysis to Adapt the Cache Coherence Enforcement Strategy to the Data Sharing Characteristics

IEEE Transactions on Parallel and Distributed Systems
An executable specification, analyzer and verifier for RMO (relaxed memory order)

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
On shortest path routing in single stage shuffle-exchange networks

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A comprehensive bibliography of distributed shared memory

ACM SIGOPS Operating Systems Review
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Architectural mechanisms for explicit communication in shared memory multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A proposal of self-cleanup cache

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
An analysis of dag-consistent distributed shared-memory algorithms

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Verification techniques for cache coherence protocols

ACM Computing Surveys (CSUR)
A Survey of Recoverable Distributed Shared Virtual Memory Systems

IEEE Transactions on Parallel and Distributed Systems
An interaction of coherence protocols and memory consistency models in DSM systems

ACM SIGOPS Operating Systems Review
Lamport clocks: verifying a directory cache-coherence protocol

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Computation-centric memory models

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Formal verification of complex coherence protocols using symbolic state models

Journal of the ACM (JACM)
Retrospective: weak ordering—a new definition

25 years of the international symposia on Computer architecture (selected papers)
Weak ordering—a new definition

25 years of the international symposia on Computer architecture (selected papers)
Memory consistency and event ordering in scalable shared-memory multiprocessors

25 years of the international symposia on Computer architecture (selected papers)
An Executable Specification and Verifier for Relaxed Memory Order

IEEE Transactions on Computers - Special issue on cache memory and related problems
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Weak ordering—a new definition

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
PLUS: a distributed shared-memory system

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An asynchronous protocol for release consistent distributed shared memory systems

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
On Interaction between Interconnection Network Design and Latency Hiding Techniques in Multiprocessors

The Journal of Supercomputing
Formal Automatic Verification of Cache Coherence in Multiprocessors with Relaxed Memory Models

IEEE Transactions on Parallel and Distributed Systems
View-based consistency and false sharing effect in distributed shared memory

ACM SIGOPS Operating Systems Review
Hiding Relaxed Memory Consistency with a Compiler

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Integrated Network Barriers

IEEE Transactions on Parallel and Distributed Systems
Shared Memory Consistency Models: A Tutorial

Computer
Storage in the PowerPC

IEEE Micro
Performance of Pruning-Cache Directories for Large-Scale Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A Unified Formalization of Four Shared-Memory Models

IEEE Transactions on Parallel and Distributed Systems
Access Graphs: A Model for Investigating Memory Consistency

IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Four Memory Consistency Models for Multithreaded Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Memory Consistency and Process Coordination for SPARC Multiprocessors

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Dag-Consistent Distributed Shared Memory

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Bounds for Mutual Exclusion with only Processor Consistency

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
A Technique for the Distributed Simulation of Parallel Computers

MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Automatic fence insertion for shared memory multiprocessing

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Hardware Controlled Prefeching in Directory-Based Cache Coherent Systems

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Relaxing Cache Coherence Protocol with QOLB Synchronizations

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
References

Sourcebook of parallel computing
Consistency and event ordering in the shared regions model

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Tolerating Late Memory Traps in Dynamically Scheduled Processors

IEEE Transactions on Computers
CAS-DSM: a compiler assisted software distributed shared memory

International Journal of Parallel Programming
A unified theory of shared memory consistency

Journal of the ACM (JACM)
The Java memory model

Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Maintaining Consistency and Bounding Capacity of Software Code Caches

Proceedings of the international symposium on Code generation and optimization
Proving refinement using transduction

Distributed Computing - Special issue: Verification of lazy caching
Store Memory-Level Parallelism Optimizations for Commercial Applications

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit

International Journal of High Performance Computing Applications
High Performance Remote Memory Access Communication: The Armci Approach

International Journal of High Performance Computing Applications
Tight Bounds for Critical Sections in Processor Consistent Platforms

IEEE Transactions on Parallel and Distributed Systems
Specifying memory consistency of write buffer multiprocessors

ACM Transactions on Computer Systems (TOCS)
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
Message-driven relaxed consistency in a software distributed shared memory

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Embedded real-time architecture for level-set-based active contours

EURASIP Journal on Applied Signal Processing
RMOST: A Shared Memory Model for Online Steering

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
The semantics of x86-CC multiprocessor machine code

Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Relaxed memory models: an operational approach

Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Synthesis from multi-cycle atomic actions as a solution to the timing closure problem

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Complete formal specification of the OpenMP memory model

International Journal of Parallel Programming
Identifying Inter-task Communication in Shared Memory Programming Models

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Steering of sequential jobs with a distributed shared memory based model for online steering

Future Generation Computer Systems
A tuneable software cache coherence protocol for heterogeneous MPSoCs

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Butterfly analysis: adapting dataflow analysis to dynamic parallel monitoring

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
LReplay: a pending period based deterministic replay scheme

Proceedings of the 37th annual international symposium on Computer architecture
Formal specification of the OpenMP memory model

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
On the effectiveness of speculative and selective memory fences

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploiting locality: a flexible DSM approach

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Realization and performance comparison of sequential and weak memory consistency models in network-on-chip based multi-core systems

Proceedings of the 16th Asia and South Pacific Design Automation Conference
Chip multithreaded consistency model

Journal of Computer Science and Technology
Understanding POWER multiprocessors

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
The impact of memory models on software reliability in multiprocessors

Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
View-Oriented parallel programming and view-based consistency

PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Debugging distributed shared memory applications

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Exclusive Access to Resources in Distributed Shared Memory Architecture

Fundamenta Informaticae - Concurrency Specification and Programming (CS&P)
Edge chasing delayed consistency: pushing the limits of weak memory models

Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Heterogeneous-race-free memory models

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache. This is done to reduce the average memory access latency and to take advantage of memory interleaving. Lock-up free caches are designed to avoid processor blocking on a cache miss. Write buffers are often included in a pipelined machine to avoid processor waiting on writes. In a shared memory multiprocessor, there are more advantages in buffering memory requests, since each memory access has to traverse the memory- processor interconnection and has to compete with memory requests issued by different processors. Buffering, however, can cause logical problems in multiprocessors. These problems are aggravated if each processor has a private memory in which shared writable data may be present, such as in a cache-based system or in a system with a distributed global memory. In this paper, we analyze the benefits and problems associated with the buffering of memory requests in shared memory multiprocessors. We show that the logical problem of buffering is directly related to the problem of synchronization. A simple model is presented to evaluate the performance improvement resulting from buffering.