Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Correct memory operation of cache-based multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Hierarchical cache/bus architecture for shared memory multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Efficient synchronization primitives for large-scale cache-coherent multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Computer
Paradigm: A Highly Scalable Shared-Memory Multicomputer Architecture
Computer - Special issue on cryptography
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An empirical evaluation of two memory-efficient directory methods
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cooperative shared memory: software and hardware for scalable multiprocessor
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Volume rendering on scalable shared-memory MIMD architectures
VVS '92 Proceedings of the 1992 workshop on Volume visualization
Heterogeneous parallel programming in Jade
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Dynamic object management for distributed data structures
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Integrating message-passing and shared-memory: early experience
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data locality and load balancing in COOL
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
ACM SIGOPS Operating Systems Review
Cooperative shared memory: software and hardware for scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
Issues and directions in scalable parallel computing
PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Mechanisms for cooperative shared memory
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Object distribution in Orca using Compile-Time and Run-Time techniques
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Anatomy of a message in the Alewife multiprocessor
ICS '93 Proceedings of the 7th international conference on Supercomputing
Super-threading: architectural and software mechanisms for optimizing parallel computation
ICS '93 Proceedings of the 7th international conference on Supercomputing
Integrating volume data analysis and rendering on distributed memory architectures
PRS '93 Proceedings of the 1993 symposium on Parallel rendering
Parallel volume-rendering algorithm performance on mesh-connected multicomputers
PRS '93 Proceedings of the 1993 symposium on Parallel rendering
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Developing parallel applications using high-performance simulation
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Cache coherence using local knowledge
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compiling for shared-memory and message-passing computers
ACM Letters on Programming Languages and Systems (LOPLAS)
On testing cache-coherent shared memories
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Performance evaluation of hybrid hardware and software distributed shared memory protocols
ICS '94 Proceedings of the 8th international conference on Supercomputing
Design and implementation of a prototype optical deflection network
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
A comparison of message passing and shared memory architectures for data parallel programs
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Software versus hardware shared-memory implementation: a case study
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Software-extended coherent shared memory: performance and cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Parallel sorting by over partitioning
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Reactive synchronization algorithms for multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simple compiler algorithms to reduce ownership overhead in cache coherence protocols
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Mixed consistency: a model for parallel programming (extended abstract)
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
The design of RPM: an FPGA-based multiprocessor emulator
FPGA '95 Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
IBM Systems Journal
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
Efficient shared memory with minimal hardware support
ACM SIGARCH Computer Architecture News
Memory system performance of UNIX on CC-NUMA multiprocessors
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
On characterizing bandwidth requirements of parallel applications
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hive: fault containment for shared-memory multiprocessors
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Lazy release consistency for hardware-coherent multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Architectural mechanisms for explicit communication in shared memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Data forwarding in scalable shared-memory multiprocessors
ICS '95 Proceedings of the 9th international conference on Supercomputing
Multithreading with the EM-4 distributed-memory multiprocessor
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A compiler algorithm that reduces read latency in ownership-based cache coherence protocols
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves
IEEE Transactions on Parallel and Distributed Systems
Teapot: language support for writing memory coherence protocols
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
pHluid: the design of a parallel functional language implementation on workstations
Proceedings of the first ACM SIGPLAN international conference on Functional programming
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
MGS: a multigrain shared memory system
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
COMA: an opportunity for building fault-tolerant scalable shared memory multiprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Application and architectural bottlenecks in large scale distributed shared memory machines
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Limits on the performance benefits of multithreading and prefetching
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using dataflow analysis techniques to reduce ownership overhead in cache coherence protocols
ACM Transactions on Programming Languages and Systems (TOPLAS)
Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements
IEEE Transactions on Parallel and Distributed Systems
A flexible operation execution model for shared distributed objects
Proceedings of the 11th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The GLOW cache coherence protocol extensions for widely shared data
ICS '96 Proceedings of the 10th international conference on Supercomputing
Evaluating virtual channels for cache-coherent shared-memory multiprocessors
ICS '96 Proceedings of the 10th international conference on Supercomputing
Conservative circuit simulation on shared-memory multiprocessors
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
State reduction using reversible rules
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Integrating formal verification methods with a conventional project design flow
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Fast Parallel Sorting Under LogP: Experience with the CM-5
IEEE Transactions on Parallel and Distributed Systems
The Block Distributed Memory Model
IEEE Transactions on Parallel and Distributed Systems
An Architecture for Tolerating Processor Failures in Shared-Memory Multiprocessors
IEEE Transactions on Computers
Scheduler-conscious synchronization
ACM Transactions on Computer Systems (TOCS)
Data Forwarding in Scalable Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Compressionless Routing: A Framework for Adaptive and Fault-Tolerant Routing
IEEE Transactions on Parallel and Distributed Systems
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
A performance evaluation of cluster architectures
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
HFS: a performance-oriented flexible file system based on building-block compositions
ACM Transactions on Computer Systems (TOCS)
Parallel breadth-first BDD construction
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Relaxed consistency and coherence granularity in DSM systems: a performance evaluation
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Mercury Interconnect Architecture: a cost-effective infrastructure for high-performance servers
Proceedings of the 24th annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
Reactive NUMA: a design for unifying S-COMA and CC-NUMA
Proceedings of the 24th annual international symposium on Computer architecture
A Survey of Recoverable Distributed Shared Virtual Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Parallel hierarchical computation of specular radiosity
PRS '97 Proceedings of the IEEE symposium on Parallel rendering
An interaction of coherence protocols and memory consistency models in DSM systems
ACM SIGOPS Operating Systems Review
Performance evaluation of the Orca shared-object system
ACM Transactions on Computer Systems (TOCS)
Tolerating latency in multiprocessors through compiler-inserted prefetching
ACM Transactions on Computer Systems (TOCS)
A Cost and Speed Model for k-ary n-Cube Wormhole Routers
IEEE Transactions on Parallel and Distributed Systems
Digital system simulation: methodologies and examples
DAC '98 Proceedings of the 35th annual Design Automation Conference
A study of three dynamic approaches to handle widely shared data in shared-memory multiprocessors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
The DASH prototype: implementation and performance
25 years of the international symposia on Computer architecture (selected papers)
The Stanford FLASH multiprocessor
25 years of the international symposia on Computer architecture (selected papers)
Tempest and typhoon: user-level shared memory
25 years of the international symposia on Computer architecture (selected papers)
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
Wormhole routing techniques for directly connected multicomputer systems
ACM Computing Surveys (CSUR)
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications
IEEE Transactions on Computers - Special issue on cache memory and related problems
IEEE Transactions on Computers - Special issue on cache memory and related problems
Coherence Controller Architectures for Scalable Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Excel-NUMA: Toward Programmability, Simplicity, and High Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements
IEEE Transactions on Parallel and Distributed Systems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
Formal verification in hardware design: a survey
ACM Transactions on Design Automation of Electronic Systems (TODAES)
The QRQW PRAM: accounting for contention in parallel algorithms
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
IBM Systems Journal
Teapot: A Domain-Specific Language for Writing Cache Coherence Protocols
IEEE Transactions on Software Engineering
PiSMA: a parallel VSM architecture
Crossroads
Improving parallel system performance by changing the arrangement of the network links
Proceedings of the 14th international conference on Supercomputing
An asynchronous protocol for release consistent distributed shared memory systems
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
IEEE Transactions on Parallel and Distributed Systems
The Odd-Even Turn Model for Adaptive Routing
IEEE Transactions on Parallel and Distributed Systems
Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors
IEEE Transactions on Computers
Dynamic Task Scheduling Using Online Optimization
IEEE Transactions on Parallel and Distributed Systems
Exploiting Network Locality for CC-NUMA Multiprocessors
The Journal of Supercomputing
Proceedings of the 2001 ACM symposium on Applied computing
Parallelizing the Murϕ Verifier
Formal Methods in System Design - Special issue on CAV '97
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
IEEE Transactions on Computers
A Cost-Effective Approach to Deadlock Handling in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
A Fast and Efficient Processor Allocation Scheme for Mesh-Connected Multicomputers
IEEE Transactions on Computers
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Deriving Efficient Cache Coherence Protocols Through Refinement
Formal Methods in System Design
Modeling of interconnection subsystems for massively parallel computers
Performance Evaluation
Tolerating node failures in cache only memory architectures
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Application-specific protocols for user-level shared memory
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Compiler Support for Array Distribution onNUMA Shared Memory Multiprocessors
The Journal of Supercomputing
The Journal of Supercomputing - Special issue on embedded fault-tolerance systems
An Application-Driven Study of Multicast Communication for Write Invalidation
The Journal of Supercomputing
Load Balancing for Parallel Query Execution on NUMA Multiprocessors
Distributed and Parallel Databases
A Simulation Study of Hardware-Oriented DSM Approaches
IEEE Parallel & Distributed Technology: Systems & Technology
Distributed Shared Memory: Concepts and Systems
IEEE Parallel & Distributed Technology: Systems & Technology
Performance Analysis of Cluster-Based Multiprocessors
IEEE Transactions on Computers
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Mesh Interconnection Networks with Deterministic Routing
IEEE Transactions on Parallel and Distributed Systems
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Four Memory Consistency Models for Multithreaded Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Packet Synchronization for Synchronous Optical Deflection-Routed Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution
IEEE Transactions on Software Engineering
A General Data Layout for Distributed Consistency in Data Parallel Applications
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Network Performance under Physical Constraints
ICPP '97 Proceedings of the international Conference on Parallel Processing
Kiloprocessor Extensions to SCI
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Benefits of Processor Clustering in Designing Large Parallel Systems: When and How?
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Deadlock- and Livelock-Free Routing Protocols for Wave Switching
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
IWCC '01 Proceedings of the NATO Advanced Research Workshop on Advanced Environments, Tools, and Applications for Cluster Computing-Revised Papers
Embedded Cluster Computing through Dynamic Reconfigurability of Inter-Processor Connections
IWCC '01 Proceedings of the NATO Advanced Research Workshop on Advanced Environments, Tools, and Applications for Cluster Computing-Revised Papers
A High-Level Programming Environment for Distributed Memory Architectures
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
A Parallel System Architecture Based on Dynamically Configurable Shared Memory Clusters
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Task Scheduling for Dynamically Configurable Multiple SMP Clusters Based on Extended DSC Approach
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
How Can We Design Better Networks for DSM Systems?
PCRCW '97 Proceedings of the Second International Workshop on Parallel Computer Routing and Communication
A Compiler-Assisted Scheme for Adaptive Cache Coherence Enforcement
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Measuring Consistency Costs for Distributed Shared Data
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Active Memory Clusters: Efficient Multiprocessing on Commodity Clusters
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
SMP system interconnect instrumentation for performance analysis
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Efficient synchronization for nonuniform communication architectures
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Processor Allocation in the Mesh Multiprocessors Using the Leapfrog Method
IEEE Transactions on Parallel and Distributed Systems
Cluster Queue Structure for Shared-Memory Multiprocessor Systems
The Journal of Supercomputing
Inferential queueing and speculative push for reducing critical communication latencies
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Complexity and Performance in Parallel Programming Languages
HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Efficient and balanced adaptive routing in two-dimensional meshes
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Abstracting network characteristics and locality properties of parallel systems
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Software cache coherence for large scale multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Cache Coherency Protocol for Optically Connected Parallel Computer Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Using memory-mapped network interfaces to improve the performance of distributed shared memory
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Hierarchical Backoff Locks for Nonuniform Communication Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Using cache optimizing compiler for managing software cache on distributed shared memory system
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Trojan: A High-Performance Simulator for Shared Memory Architectures
SS '96 Proceedings of the 29th Annual Simulation Symposium (SS '96)
The Thread-Based Protocol Engines for CC-NUMA Multiprocessors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Modeling and evaluating the time overhead induced by BER in COMA multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
(De-) Clustering Objects for Multiprocessor System Software
IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
Sourcebook of parallel computing
The Impact of Negative Acknowledgments in Shared Memory Scientific Applications
IEEE Transactions on Parallel and Distributed Systems
Stateful distributed interposition
ACM Transactions on Computer Systems (TOCS)
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Exploring Virtual Network Selection Algorithms in DSM Cache Coherence Protocols
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
CAS-DSM: a compiler assisted software distributed shared memory
International Journal of Parallel Programming
Towards scalable collective communication for multicomputer interconnection networks
Information Sciences: an International Journal - Special issue: Information technology
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Speculative Incoherent Cache Protocols
IEEE Micro
Prediction model for evaluation of reconfigurable interconnects in distributed shared-memory systems
Proceedings of the 2005 international workshop on System level interconnect prediction
Traffic Temporal Analysis for Reconfigurable Interconnects in Shared-Memory Systems
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Sequential consistency and the lazy caching algorithm
Distributed Computing - Special issue: Verification of lazy caching
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Formal Verification and its Impact on the Snooping versus Directory Protocol Debate
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Congestion modeling for reconfigurable inter-processor networks
Proceedings of the 2006 international workshop on System-level interconnect prediction
Inferential queueing and speculative push
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Optimal and efficient parallel tridiagonal solvers using direct methods
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
International Journal of Parallel Programming
Efficiently generating test vectors with state pruning
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
An efficient cache design for scalable glueless shared-memory multiprocessors
Proceedings of the 3rd conference on Computing frontiers
Symmetry in temporal logic model checking
ACM Computing Surveys (CSUR)
A plane-based broadcast algorithm for multicomputer networks
Journal of Systems Architecture: the EUROMICRO Journal
On balancing network traffic in path-based multicast communication
Future Generation Computer Systems - Systems performance analysis and evaluation
Tight Bounds for Critical Sections in Processor Consistent Platforms
IEEE Transactions on Parallel and Distributed Systems
Support for High-Frequency Streaming in CMPs
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
K42: building a complete operating system
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Probabilistic analysis on mesh network fault tolerance
Journal of Parallel and Distributed Computing
Predicting reconfigurable interconnect performance in distributed shared-memory systems
Integration, the VLSI Journal
Proximity-aware directory-based coherence for multi-core processor architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
BulkSC: bulk enforcement of sequential consistency
Proceedings of the 34th annual international symposium on Computer architecture
Comparison of Mesh and Hierarchical Networks for Multiprocessors
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Reducing the Write Traffic for a Hybrid Cache Protocol
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
The Power of Priority: NoC Based Distributed Cache Coherency
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
The design and evaluation of a shared object system for distributed memory machines
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Brazos: a third generation DSM system
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Experience with a language for writing coherence protocols
DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
Experience distributing objects in an SMMP OS
ACM Transactions on Computer Systems (TOCS)
The mechanics of in-kernel synchronization for a scalable microkernel
ACM SIGOPS Operating Systems Review
Microprocessors & Microsystems
Cache coherency communication cost in a NoC-based MPSoC platform
Proceedings of the 20th annual conference on Integrated circuits and systems design
Performance of deterministic and adaptive broadcast algorithms in multicomputer networks
International Journal of High Performance Computing and Networking
A case for low-complexity MP architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
IBM Journal of Research and Development
Scientific Programming - High Performance Computing for Mission-Enabling Space Applications
25 Years of Model Checking
Journal of Parallel and Distributed Computing
Dynamic security domain scaling on embedded symmetric multiprocessors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Disaggregated memory for expansion and sharing in blade servers
Proceedings of the 36th annual international symposium on Computer architecture
A Novel Cache Organization for Tiled Chip Multiprocessor
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Microprocessors & Microsystems
A tuneable software cache coherence protocol for heterogeneous MPSoCs
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
High-throughput coherence control and hardware messaging in everest
IBM Journal of Research and Development
Fault-tolerant mapping of a mesh network in a flexible hypercube
WSEAS Transactions on Computers
Lower bounds on the connectivity probability for 2-D mesh networks
WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
On-chip communication and synchronization mechanisms with cache-integrated network interfaces
Proceedings of the 7th ACM international conference on Computing frontiers
Fault-tolerant meshes and tori embedded in a faulty supercube
WSEAS Transactions on Computers
Scalable hardware support for conditional parallelization
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Dynamic SMP clusters with communication on the fly
ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Architectural support for thread communications in multi-core processors
Parallel Computing
Architectural Support for Fair Reader-Writer Locking
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Performance comparison of some shared memory organizations for 2D mesh-like NOCs
Microprocessors & Microsystems
Efficient synchronization for embedded on-chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Embedding meshes into twisted-cubes
Information Sciences: an International Journal
WSEAS Transactions on Information Science and Applications
Probabilistic analysis of time reduction by eliminating barriers in parallel programmes
International Journal of Communication Networks and Distributed Systems
On fault-tolerant embedding of meshes and tori in a flexible hypercube with unbounded expansion
WSEAS TRANSACTIONS on SYSTEMS
A hardware supported multicast scheme based on XY routing for 2-D mesh InfiniBand networks
The Journal of Supercomputing
View-Oriented parallel programming and view-based consistency
PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Upper bounds on the connection probability for 2-D meshes and tori
Journal of Parallel and Distributed Computing
Speeding-up synchronizations in DSM multiprocessors
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
On consistency of encrypted files
DISC'06 Proceedings of the 20th international conference on Distributed Computing
Barrier elimination based on access dependency analysis for OpenMP
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Improving coherence protocol reactiveness by trading bandwidth for latency
Proceedings of the 9th conference on Computing Frontiers
Distributed memory virtualization with the use of SDDSfL
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
LIGERO: A light but efficient router conceived for cache-coherent chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Computers and Electrical Engineering
Exploring topologies for source-synchronous ring-based network-on-chip
Proceedings of the Conference on Design, Automation and Test in Europe
A heterogeneous multiple network-on-chip design: an application-aware approach
Proceedings of the 50th Annual Design Automation Conference
Location-aware cache management for many-core processors with deep cache hierarchy
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Deflection routing in 3D network-on-chip with limited vertical bandwidth
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
OCTET: capturing and controlling cross-thread dependences efficiently
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Using in-flight chains to build a scalable cache coherence protocol
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 4.13 |
The overall goals and major features of the directory architecture for shared memory (Dash) are presented. The fundamental premise behind the architecture is that it is possible to build a scalable high-performance machine with a single address space and coherent caches. The Dash architecture is scalable in that it achieves linear or near-linear performance growth as the number of processors increases from a few to a few thousand. This performance results from distributing the memory among processing nodes and using a network with scalable bandwidth to connect the nodes. The architecture allows shared data to be cached, significantly reducing the latency of memory accesses and yielding higher processor utilization and higher overall performance. A distributed directory-based protocol that provides cache coherence without compromising scalability is discussed in detail. The Dash prototype machine and the corresponding software support are described.