Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Memory access buffering in multiprocessors
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An empirical evaluation of two memory-efficient directory methods
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
An economical solution to the cache coherence problem
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
VLSI Mesh Routing Systems
Tango introduction and tutorial
Tango introduction and tutorial
Verifying a Multiprocessor Cache Controller Using Random Case
Verifying a Multiprocessor Cache Controller Using Random Case
Paradigm: A Highly Scalable Shared-Memory Multicomputer Architecture
Computer - Special issue on cryptography
Hector: A Hierarchically Structured Shared-Memory Multiprocessor
Computer - Special issue on experimental research in computer architecture
Efficient Doacross execution on distributed shared-memory multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Coarse-grain parallel programming in Jade
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Distributed Shared Memory: A Survey of Issues and Algorithms
Computer - Distributed computing systems: separate resources acting as one
Proving sequential consistency of high-performance shared memories (extended abstract)
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Detecting violations of sequential consistency
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
User-level interprocess communication for shared memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Race-free interconnection networks and multiprocessor consistency
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Delayed consistency and its effects on the miss rate of parallel programs
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The Stanford Dash Multiprocessor
Computer
An evaluation of the Chandy-Misra-Bryant algorithm for digital logic simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on parallel and distributed systems performance
A performance study of memory consistency models
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Towards a shared-memory massively parallel multiprocessor
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Comparative performance evaluation of cache-coherent NUMA and COMA architectures
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Adjustable block size coherent caches
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The turn model for adaptive routing
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cache Invalidation Patterns in Shared-Memory Multiprocessors
IEEE Transactions on Computers
Specifying non-blocking shared memories (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic software cache coherence through vectorization
ICS '92 Proceedings of the 6th international conference on Supercomputing
A performance evaluation of optimal hybrid cache coherency protocols
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Characterizing the caching and synchronization performance of a multiprocessor operating system
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Closing the window of vulnerability in multiphase memory transactions
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Volume rendering on scalable shared-memory MIMD architectures
VVS '92 Proceedings of the 1992 workshop on Volume visualization
Willow: a scalable shared memory multiprocessor
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Cache consistency in hierarchical-ring-based multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A scalable coherent cache system with a dynamic pointing scheme
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A scalable snoopy coherence scheme on distributed shared-memory multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The shared regions approach to software cache coherence on multiprocessors
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Computation migration: enhancing locality for distributed-memory parallel systems
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
The detection and elimination of useless misses in multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Evaluation of release consistent software distributed shared memory on emerging network technology
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Anatomy of a message in the Alewife multiprocessor
ICS '93 Proceedings of the 7th international conference on Supercomputing
A parallel adaptive fast multipole method
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Micro benchmark analysis of the KSR1
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A distributed shared memory multiprocessor ASURA: memory and cache architecture
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Hot spot analysis in large scale shared memory multiprocessors
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Modeling communication in parallel algorithms: a fruitful interaction between theory and systems?
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Exploiting cache affinity in software cache coherence
ICS '94 Proceedings of the 8th international conference on Supercomputing
Performance evaluation of hybrid hardware and software distributed shared memory protocols
ICS '94 Proceedings of the 8th international conference on Supercomputing
A quantitative analysis of cache policies for scalable network file systems
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The turn model for adaptive routing
Journal of the ACM (JACM)
Request Combining in Multiprocessors with Arbitrary Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
Evaluating the memory overhead required for COMA architectures
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
LCM: memory system support for parallel language implementation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance advantages of integrating block data transfer in cache-coherent multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Implications of hierarchical N-body methods for multiprocessor architectures
ACM Transactions on Computer Systems (TOCS)
Optimizing parallel programs with explicit synchronization
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Techniques for reducing consistency-related communication in distributed shared-memory systems
ACM Transactions on Computer Systems (TOCS)
The communication requirements of mutual exclusion
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
An analytic study of dynamic hardware and software cache coherence strategies
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Serverless network file systems
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Lazy release consistency for hardware-coherent multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Parallelizing Navier-Stokes computations on a variety of architecture platforms
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Empirical evaluation of the CRAY-T3D: a compiler perspective
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Evaluating the impact of advanced memory systems on compiler-parallelized codes
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Serverless network file systems
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
A Trip-Based Multicasting Model in Wormhole-Routed Networks with Virtual Channels
IEEE Transactions on Parallel and Distributed Systems
Efficient LRU-Based Buffering in a LAN Remote Caching Architecture
IEEE Transactions on Parallel and Distributed Systems
On Effective Execution of Nonuniform DOACROSS Loops
IEEE Transactions on Parallel and Distributed Systems
Evaluation of multithreaded uniprocessors for commercial application environments
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Verification of FLASH cache coherence protocol by aggregation of distributed transactions
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Strategic directions in computer architecture
ACM Computing Surveys (CSUR) - Special ACM 50th-anniversary issue: strategic directions in computing research
Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Characterizing the Memory Behavior of Compiler-Parallelized Applications
IEEE Transactions on Parallel and Distributed Systems
Verification techniques for cache coherence protocols
ACM Computing Surveys (CSUR)
Design and performance of the Shasta distributed shared memory protocol
ICS '97 Proceedings of the 11th international conference on Supercomputing
The interaction of parallel programming constructs and coherence protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Mercury Interconnect Architecture: a cost-effective infrastructure for high-performance servers
Proceedings of the 24th annual international symposium on Computer architecture
VM-based shared memory on low-latency, remote-memory-access networks
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Design issues of a cooperative cache with no coherence problems
Proceedings of the fifth workshop on I/O in parallel and distributed systems
Evaluating parallel logic programming systems on scalable multiprocessors
PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
Design Verification of the S3.mp Cache-Coherent Shared-Memory System
IEEE Transactions on Computers
An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations
The Journal of Supercomputing
Checkpointing Distributed Shared Memory
The Journal of Supercomputing - Special issue: high performance distributed computing
Adapting the Network Interface for High-Performance Computing: The CNI Approach
The Journal of Supercomputing - Special issue: high performance distributed computing
Performance monitoring in a Myrinet-connected SHRIMP cluster
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Formal verification of complex coherence protocols using symbolic state models
Journal of the ACM (JACM)
Memory consistency and event ordering in scalable shared-memory multiprocessors
25 years of the international symposia on Computer architecture (selected papers)
The DASH prototype: implementation and performance
25 years of the international symposia on Computer architecture (selected papers)
The turn model for adaptive routing
25 years of the international symposia on Computer architecture (selected papers)
The Stanford FLASH multiprocessor
25 years of the international symposia on Computer architecture (selected papers)
Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming Constructs
International Journal of Parallel Programming
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving the performance of bristled CC-NUMA systems using virtual channels and adaptivity
ICS '99 Proceedings of the 13th international conference on Supercomputing
Verifying Systems with Replicated Components in Mur&b.phiv;
Formal Methods in System Design
Ace: a language for parallel programming with customizable protocols
ACM Transactions on Computer Systems (TOCS)
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An asynchronous protocol for release consistent distributed shared memory systems
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Architecture and design of AlphaServer GS320
ACM SIGPLAN Notices
Hint-based cooperative caching
ACM Transactions on Computer Systems (TOCS)
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Parallel hierarchical molecular structure estimation
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Removing architectural bottlenecks to the scalability of speculative parallelization
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A parallel Lauritzen-Spiegelhalter algorithm for probabilistic inference
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Parallel protein structure determination from uncertain data
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Simulation coverage enhancement using test stimulus transformation
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Speculative synchronization: applying thread-level speculation to explicitly parallel applications
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Hierarchical Scalable Photonic Architectures for High-Performance Processor Interconnection
IEEE Transactions on Computers
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
Evaluation of Parallel Copying Garbage Collection on a Shared-Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
Replication Algorithms in a Remote Caching Architecture
IEEE Transactions on Parallel and Distributed Systems
Access Graphs: A Model for Investigating Memory Consistency
IEEE Transactions on Parallel and Distributed Systems
Hardware Versus Software Implementation of COMA
ICPP '97 Proceedings of the international Conference on Parallel Processing
An adaptive sequential prefetching scheme in shared-memory multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Formal Verification of Delayed Consistency Protocols
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Eliminating Stale Data References through Array Data-Flow Analysis
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Categorizing Network Traffic in Update-Based Protocols on Scalable Multiprocessors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
An Evaluation of a Commercial CC-NUMA Architecture: The CONVEX Exemplar SPP1200
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Coherent Block Data Transfer in the FLASH Multiprocessor
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Efficient Handling of Message-Dependent Deadlock
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The Influence of Architectural Parameters on the Performance of Parallel Logic Programming Systems
PADL '99 Proceedings of the First International Workshop on Practical Aspects of Declarative Languages
How Can We Design Better Networks for DSM Systems?
PCRCW '97 Proceedings of the Second International Workshop on Parallel Computer Routing and Communication
The Impact of Cache Coherence Protocols on Systems using Fine-Grain Data Synchronization
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Parallel Computation: MM +/- X
Informatics - 10 Years Back. 10 Years Ahead.
The Impact of Cache Coherence Protocols on Parallel Logic Programming Systems
CL '00 Proceedings of the First International Conference on Computational Logic
FMCAD '98 Proceedings of the Second International Conference on Formal Methods in Computer-Aided Design
A Progressive Approach to Handling Message-Dependent Deadlock in Parallel Computer Systems
IEEE Transactions on Parallel and Distributed Systems
Scalability in computing for today and tomorrow
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Hardware Controlled Prefeching in Directory-Based Cache Coherent Systems
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Software cache coherence for large scale multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
An Initial evaluation of the Convex SPP-1000 for Earth and Space Science Applications
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Using memory-mapped network interfaces to improve the performance of distributed shared memory
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Cost-Sensitive Cache Replacement Algorithms
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Hierarchical Memory Directory Scheme Via Extending SCI for Large-Scale Multiprocessors
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
CNI: A High-Performance Network Interface for Workstation Clusters
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Portable transparent checkpointing for distributed shared memory
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Evaluation of cache consistency algorithm performance
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
SPIRAL: A Client-Transparent Third-Party Transfer Scheme for Network Attached Disks
MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Aizu supercomputer: a massively parallel system for virtual reality problems
PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Reduced Overhead Logging for Rollback Recovery in Distributed Shared Memory
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
A Lattice Based Framework of Shared Memory Consistency Models
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
A Clustered Approach to Multithreaded Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Verifying Sequential Consistency on Shared-Memory Multiprocessors by Model Checking
IEEE Transactions on Parallel and Distributed Systems
Integrating applications with cache and memory management on a shared-memory multiprocessor
CASCON '92 Proceedings of the 1992 conference of the Centre for Advanced Studies on Collaborative research - Volume 1
The Impact of Negative Acknowledgments in Shared Memory Scientific Applications
IEEE Transactions on Parallel and Distributed Systems
Exploring Virtual Network Selection Algorithms in DSM Cache Coherence Protocols
IEEE Transactions on Parallel and Distributed Systems
A unified theory of shared memory consistency
Journal of the ACM (JACM)
Proving refinement using transduction
Distributed Computing - Special issue: Verification of lazy caching
Cache coherence support for non-shared bus architecture on heterogeneous MPSoCs
Proceedings of the 42nd annual Design Automation Conference
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Efficient address remapping in distributed shared-memory systems
ACM Transactions on Architecture and Code Optimization (TACO)
Circulating shared-registers for multiprocessor systems
Journal of Systems Architecture: the EUROMICRO Journal
Coherence Ordering for Ring-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual hierarchies to support server consolidation
Proceedings of the 34th annual international symposium on Computer architecture
An Integrated Methodology for the Verification of Directory-Based Cache Protocols
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Can High Bandwidth and Latency Justify Large Cache Blocks in Scalable Multiprocessors?
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Performance of Switch Blocking on Multithreaded Architectures
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Cooperative caching: using remote client memory to improve file system performance
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Distributed texture memory in a multi-GPU environment
GH '06 Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A case for low-complexity MP architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Reducing the Interconnection Network Cost of Chip Multiprocessors
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Extending CC-NUMA systems to support write update optimizations
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Distributed cooperative caching
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Using a configurable processor generator for computer architecture prototyping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Efficient shared-memory support for parallel graph reduction
Future Generation Computer Systems
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Efficient methods for formally verifying safety properties of hierarchical cache coherence protocols
Formal Methods in System Design
The connection-then-credit flow control protocol for heterogeneous multicore systems-on-chip
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special issue on the 2009 ACM/IEEE international symposium on networks-on-chip
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Fractal Coherence: Scalably Verifiable Cache Coherence
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
NoC-aware cache design for multithreaded execution on tiled chip multiprocessors
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
A new hybrid directory scheme for shared memory multi-processors
CSR'06 Proceedings of the First international computer science conference on Theory and Applications
The data diffusion space for parallel computing in clusters
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Manager-client pairing: a framework for implementing coherence hierarchies
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Complementing user-level coarse-grain parallelism with implicit speculative parallelism
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
TM2C: a software transactional memory for many-cores
Proceedings of the 7th ACM european conference on Computer Systems
An automatic code overlaying technique for multicores with explicitly-managed memory hierarchies
Proceedings of the Tenth International Symposium on Code Generation and Optimization
The Journal of Supercomputing
High-performance fractal coherence
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.02 |
DASH is a scalable shared-memory multiprocessor currently being developed at Stanford's Computer Systems Laboratory. The architecture consists of powerful processing nodes, each with a portion of the shared-memory, connected to a scalable interconnection network. A key feature of DASH is its distributed directory-based cache coherence protocol. Unlike traditional snoopy coherence protocols, the DASH protocol does not rely on broadcast; instead it uses point-to-point messages sent between the processors and memories to keep caches consistent. Furthermore, the DASH system does not contain any single serialization or control point. While these features provide the basis for scalability, they also force a reevaluation of many fundamental issues involved in the design of a protocol. These include the issues of correctness, performance and protocol complexity. In this paper, we present the design of the DASH coherence protocol and discuss how it addresses the above issues. We also discuss our strategy for verifying the correctness of the protocol and briefly compare our protocol to the IEEE Scalable Coherent Interface protocol.