Portable programs for parallel processors
Portable programs for parallel processors
An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The Stanford Dash Multiprocessor
Computer
The design and analysis of DASH: a scalable directory-based multiprocessor
The design and analysis of DASH: a scalable directory-based multiprocessor
An empirical evaluation of two memory-efficient directory methods
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
VLSI Mesh Routing Systems
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Application-controlled physical memory using external page-cache management
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Volume rendering on scalable shared-memory MIMD architectures
VVS '92 Proceedings of the 1992 workshop on Volume visualization
Data locality and load balancing in COOL
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Recent trends in experimental operating systems research
PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
EMC-Y: parallel processing element optimizing communication and computation
ICS '93 Proceedings of the 7th international conference on Supercomputing
Minimal adaptive routing on the mesh with bounded queue size
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Programming, compilation, and resource management issues for multithreading (panel session II)
ACM SIGARCH Computer Architecture News - Special issue: panel sessions of the 1991 workshop on multithreaded computers
Exploring the design space for a shared-cache multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Scheduling and page migration for multiprocessor compute servers
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A performance evaluation of lock-free synchronization protocols
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
ROMM routing on mesh and torus networks
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines
ICS '95 Proceedings of the 9th international conference on Supercomputing
An analytical model of high performance superscalar-based multiprocessors
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Proceedings of the 28th annual international symposium on Microarchitecture
Application and architectural bottlenecks in large scale distributed shared memory machines
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Fine grain parallel communication on general purpose LANs
ICS '96 Proceedings of the 10th international conference on Supercomputing
Symphony: a simulation backplane for parallel mixed-mode co-simulation of VLSI systems
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Network-Based Multicomputers: A Practical Supercomputer Architecture
IEEE Transactions on Parallel and Distributed Systems
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving parallel shear-warp volume rendering on shared address space multiprocessors
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Design and implementation of the NUMAchine multiprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
A methodology and an evaluation of the SGI Origin2000
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
The design, implementation, and evaluation of Jade
ACM Transactions on Programming Languages and Systems (TOPLAS)
Commit-reconcile & fences (CRF): a new memory model for architects and compiler writers
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Workload Execution Strategies and Parallel Speedup on Clustered Computers
IEEE Transactions on Computers
Performance experiences on Sun's Wildfire prototype
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Hardware spatial forwarding for widely shared data
Proceedings of the 14th international conference on Supercomputing
Mapping Parallel Application Communication Topology to Rhombic Overlapping-Cluster Multiprocessors
The Journal of Supercomputing
Submesh Determination in Faulty Tori and Meshes
IEEE Transactions on Parallel and Distributed Systems
Strategies optimization and integration in DSM
ACM SIGOPS Operating Systems Review
WWW visualisation of computer architecture simulations
Proceedings of the 7th annual conference on Innovation and technology in computer science education
Improving the performance of DSM systems via compiler involvement
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
An Evaluation of a Commercial CC-NUMA Architecture: The CONVEX Exemplar SPP1200
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Distributed Submesh Determination in Faulty Tori and Meshes
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Simulating the DASH Architecture in HASE
SS '96 Proceedings of the 29th Annual Simulation Symposium (SS '96)
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Distance-aware L2 cache organizations for scalable multiprocessor systems
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Reconfigurable embedded systems: Synthesis, design and application
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
HP scalable computing architecture
WIESS'00 Proceedings of the 1st conference on Industrial Experiences with Systems Software - Volume 1
The Power of Priority: NoC Based Distributed Cache Coherency
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
HiPEC: high performance external virtual memory caching
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
On the importance of parallel application placement in NUMA multiprocessors
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
False sharing and its effect on shared memory performance
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
The case for simple, visible cache coherency
Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
Matrix: adaptive middleware for distributed multiplayer games
Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
Matrix: adaptive middleware for distributed multiplayer games
Middleware'05 Proceedings of the ACM/IFIP/USENIX 6th international conference on Middleware
Hi-index | 0.00 |
The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. While paper studies and software simulators are useful for understanding many high-level design trade-offs, prototypes are essential to ensure that no critical details are overlooked. A prototype provides convincing evidence of the feasibility of the design allows one to accurately estimate both the hardware and the complexity cost of various features, and provides a platform for studying real workloads. A 16-processor prototype of the DASH multiprocessor has been operational for the last six months. In this paper, the hardware overhead of directory-based cache coherence in the prototype is examined. We also discuss the performance of the system, and the speedups obtained by parallel applications running on the prototype. Using a sophisticated hardware performance monitor, we characterize the effectiveness of coherent caches and the relationship between an application's reference behavior and its speedup.