ACM Computing Surveys (CSUR)
Concepts and Notations for Concurrent Programming
ACM Computing Surveys (CSUR)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Supercomputers - Design and Applications
Supercomputers - Design and Applications
The Architecture of Symbolic Computers
The Architecture of Symbolic Computers
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
A communication structure for a multiprocessor computer with distributed global memory
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Effects of buffered memory requests in multiprocessor systems
SIGMETRICS '79 Proceedings of the 1979 ACM SIGMETRICS conference on Simulation, measurement and modeling of computer systems
On cacheability of lock-variables in tightly coupled multiprocessor systems
ACM SIGARCH Computer Architecture News
Correct memory operation of cache-based multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The design of a lockup-free cache for high-performance multiprocessors
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Cache coherence for large scale shared memory multiprocessors
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Utilizing virtual shared memory in a topology independent, multicomputer environment
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Efficient Doacross execution on distributed shared-memory multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A high-performance, memory-based interconnection system for multicomputer environments
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Architectural primitives for a scalable shared memory multiprocessor
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Proving sequential consistency of high-performance shared memories (extended abstract)
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Sequential consistency versus linearizability (extended abstract)
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Detecting violations of sequential consistency
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Race-free interconnection networks and multiprocessor consistency
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Detecting data races on weak memory systems
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Cache coherence for large scale shared memory multiprocessors
ACM SIGARCH Computer Architecture News - Symposium on parallel algorithms and architectures
A conflict-free memory design for multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A correctness condition for high-performance multiprocessors (extended abstract)
STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Experiences in integrating distributed shared memory with virtual memory management
ACM SIGOPS Operating Systems Review
A performance study of memory consistency models
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Comparative performance evaluation of cache-coherent NUMA and COMA architectures
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cache consistency in hierarchical-ring-based multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
An effective write policy for software coherence schemes
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data flow equations for explicitly parallel programs
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
ACM SIGOPS Operating Systems Review
The process group approach to reliable distributed computing
Communications of the ACM
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The power of processor consistency
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
ICS '93 Proceedings of the 7th international conference on Supercomputing
Dynamic switching of coherent cache protocols and its effects on Doacross loops
ICS '93 Proceedings of the 7th international conference on Supercomputing
The KSR1: experimentation and modeling of poststore
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A distributed shared memory multiprocessor ASURA: memory and cache architecture
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Sequential consistency versus linearizability
ACM Transactions on Computer Systems (TOCS)
An evaluation of a compiler optimization for improving the performance of a coherence directory
ICS '94 Proceedings of the 8th international conference on Supercomputing
Performance evaluation of hybrid hardware and software distributed shared memory protocols
ICS '94 Proceedings of the 8th international conference on Supercomputing
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Mixed consistency: a model for parallel programming (extended abstract)
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
IEEE Transactions on Parallel and Distributed Systems
An executable specification, analyzer and verifier for RMO (relaxed memory order)
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
On shortest path routing in single stage shuffle-exchange networks
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Architectural mechanisms for explicit communication in shared memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A proposal of self-cleanup cache
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
An analysis of dag-consistent distributed shared-memory algorithms
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Verification techniques for cache coherence protocols
ACM Computing Surveys (CSUR)
A Survey of Recoverable Distributed Shared Virtual Memory Systems
IEEE Transactions on Parallel and Distributed Systems
An interaction of coherence protocols and memory consistency models in DSM systems
ACM SIGOPS Operating Systems Review
Lamport clocks: verifying a directory cache-coherence protocol
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Computation-centric memory models
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Formal verification of complex coherence protocols using symbolic state models
Journal of the ACM (JACM)
Retrospective: weak ordering—a new definition
25 years of the international symposia on Computer architecture (selected papers)
Weak ordering—a new definition
25 years of the international symposia on Computer architecture (selected papers)
Memory consistency and event ordering in scalable shared-memory multiprocessors
25 years of the international symposia on Computer architecture (selected papers)
An Executable Specification and Verifier for Relaxed Memory Order
IEEE Transactions on Computers - Special issue on cache memory and related problems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Weak ordering—a new definition
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
PLUS: a distributed shared-memory system
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An asynchronous protocol for release consistent distributed shared memory systems
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
The Journal of Supercomputing
Formal Automatic Verification of Cache Coherence in Multiprocessors with Relaxed Memory Models
IEEE Transactions on Parallel and Distributed Systems
View-based consistency and false sharing effect in distributed shared memory
ACM SIGOPS Operating Systems Review
Hiding Relaxed Memory Consistency with a Compiler
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
IEEE Transactions on Parallel and Distributed Systems
IEEE Micro
Performance of Pruning-Cache Directories for Large-Scale Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
A Unified Formalization of Four Shared-Memory Models
IEEE Transactions on Parallel and Distributed Systems
Access Graphs: A Model for Investigating Memory Consistency
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Four Memory Consistency Models for Multithreaded Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Memory Consistency and Process Coordination for SPARC Multiprocessors
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Dag-Consistent Distributed Shared Memory
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Bounds for Mutual Exclusion with only Processor Consistency
DISC '00 Proceedings of the 14th International Conference on Distributed Computing
A Technique for the Distributed Simulation of Parallel Computers
MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Automatic fence insertion for shared memory multiprocessing
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Hardware Controlled Prefeching in Directory-Based Cache Coherent Systems
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Relaxing Cache Coherence Protocol with QOLB Synchronizations
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Sourcebook of parallel computing
Consistency and event ordering in the shared regions model
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Tolerating Late Memory Traps in Dynamically Scheduled Processors
IEEE Transactions on Computers
CAS-DSM: a compiler assisted software distributed shared memory
International Journal of Parallel Programming
A unified theory of shared memory consistency
Journal of the ACM (JACM)
Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Maintaining Consistency and Bounding Capacity of Software Code Caches
Proceedings of the international symposium on Code generation and optimization
Proving refinement using transduction
Distributed Computing - Special issue: Verification of lazy caching
Store Memory-Level Parallelism Optimizations for Commercial Applications
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
International Journal of High Performance Computing Applications
High Performance Remote Memory Access Communication: The Armci Approach
International Journal of High Performance Computing Applications
Tight Bounds for Critical Sections in Processor Consistent Platforms
IEEE Transactions on Parallel and Distributed Systems
Specifying memory consistency of write buffer multiprocessors
ACM Transactions on Computer Systems (TOCS)
BulkSC: bulk enforcement of sequential consistency
Proceedings of the 34th annual international symposium on Computer architecture
Message-driven relaxed consistency in a software distributed shared memory
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Embedded real-time architecture for level-set-based active contours
EURASIP Journal on Applied Signal Processing
RMOST: A Shared Memory Model for Online Steering
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
The semantics of x86-CC multiprocessor machine code
Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Relaxed memory models: an operational approach
Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Synthesis from multi-cycle atomic actions as a solution to the timing closure problem
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Complete formal specification of the OpenMP memory model
International Journal of Parallel Programming
Identifying Inter-task Communication in Shared Memory Programming Models
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Steering of sequential jobs with a distributed shared memory based model for online steering
Future Generation Computer Systems
A tuneable software cache coherence protocol for heterogeneous MPSoCs
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Butterfly analysis: adapting dataflow analysis to dynamic parallel monitoring
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
LReplay: a pending period based deterministic replay scheme
Proceedings of the 37th annual international symposium on Computer architecture
Formal specification of the OpenMP memory model
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
On the effectiveness of speculative and selective memory fences
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploiting locality: a flexible DSM approach
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Chip multithreaded consistency model
Journal of Computer Science and Technology
Understanding POWER multiprocessors
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
The impact of memory models on software reliability in multiprocessors
Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
View-Oriented parallel programming and view-based consistency
PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Debugging distributed shared memory applications
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Exclusive Access to Resources in Distributed Shared Memory Architecture
Fundamenta Informaticae - Concurrency Specification and Programming (CS&P)
Edge chasing delayed consistency: pushing the limits of weak memory models
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Heterogeneous-race-free memory models
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.02 |
In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache. This is done to reduce the average memory access latency and to take advantage of memory interleaving. Lock-up free caches are designed to avoid processor blocking on a cache miss. Write buffers are often included in a pipelined machine to avoid processor waiting on writes. In a shared memory multiprocessor, there are more advantages in buffering memory requests, since each memory access has to traverse the memory- processor interconnection and has to compete with memory requests issued by different processors. Buffering, however, can cause logical problems in multiprocessors. These problems are aggravated if each processor has a private memory in which shared writable data may be present, such as in a cache-based system or in a system with a distributed global memory. In this paper, we analyze the benefits and problems associated with the buffering of memory requests in shared memory multiprocessors. We show that the logical problem of buffering is directly related to the problem of synchronization. A simple model is presented to evaluate the performance improvement resulting from buffering.