Portable programs for parallel processors
Portable programs for parallel processors
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The Stanford Dash Multiprocessor
Computer
The design and analysis of DASH: a scalable directory-based multiprocessor
The design and analysis of DASH: a scalable directory-based multiprocessor
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An empirical evaluation of two memory-efficient directory methods
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
VLSI Mesh Routing Systems
Tango introduction and tutorial
Tango introduction and tutorial
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
Cooperative shared memory: software and hardware for scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
Limitations of cache prefetching on a bus-based multiprocessor
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Architectural support for performance tuning: a case study on the SPARCcenter 2000
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Software-extended coherent shared memory: performance and cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Effective cache prefetching on bus-based multiprocessors
ACM Transactions on Computer Systems (TOCS)
Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters
IEEE Transactions on Parallel and Distributed Systems
Real-time volume rendering on shared memory multiprocessors using the shear-warp factorization
PRS '95 Proceedings of the IEEE symposium on Parallel rendering
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring Architectures
IEEE Transactions on Parallel and Distributed Systems
Memory system performance of UNIX on CC-NUMA multiprocessors
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Scheduling memory constrained jobs on distributed memory parallel computers
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient strategies for software-only protocols in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Hiding communication latency and coherence overhead in software DSMs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SHRIMP performance monitor: design and applications
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Temporal notions of synchronization and consistency in Beehive
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
The interaction of parallel programming constructs and coherence protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Evaluating parallel logic programming systems on scalable multiprocessors
PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
Support for Efficient Programming on the SB-PRAM
International Journal of Parallel Programming
Design choices in the SHRIMP system: an empirical study
Proceedings of the 25th annual international symposium on Computer architecture
Pc-based Shared Memory Architecture and Language
The Journal of Supercomputing
Retrospective: the DASH prototype: implementation and performance
25 years of the international symposia on Computer architecture (selected papers)
Tempest and typhoon: user-level shared memory
25 years of the international symposia on Computer architecture (selected papers)
The MIT Alewife machine: architecture and performance
25 years of the international symposia on Computer architecture (selected papers)
Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming Constructs
International Journal of Parallel Programming
An Efficient Tree Cache Coherence Protocol for Distributed Shared Memory Multiprocessors
IEEE Transactions on Computers
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Resource Scaling Effects on MPP Performance: The STAP Benchmark Implications
IEEE Transactions on Parallel and Distributed Systems
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
IEEE Transactions on Parallel and Distributed Systems
Memory Conscious Scheduling for Cluster-based NUMA Multiprocessors
The Journal of Supercomputing
WOSP '02 Proceedings of the 3rd international workshop on Software and performance
Load Balancing for Parallel Query Execution on NUMA Multiprocessors
Distributed and Parallel Databases
Design and analysis of static memory management policies for CC-NUMA Multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Analysis of a Parallel Volume Rendering System Based on the Shear-Warp Factorization
IEEE Transactions on Visualization and Computer Graphics
IEEE Transactions on Parallel and Distributed Systems
Implementing the Data Diffusion Machine Using Crossbar Routers
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
DPF: A Data Parallel Fortran Benchmark Suite
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
The Influence of Architectural Parameters on the Performance of Parallel Logic Programming Systems
PADL '99 Proceedings of the First International Workshop on Practical Aspects of Declarative Languages
Performance of MP3D on the SB-PRAM Prototype (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
SMP system interconnect instrumentation for performance analysis
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Parallel ray tracing on a chip
Practical parallel rendering
Scalability in computing for today and tomorrow
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Improving performance by cache driven memory management
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Multitasking and Multithreading on a Multiprocessor with Virtual Shared Memory
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Impact of Switch Design on the Application Performance of Cache-Coherent Multiprocessors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Journal of Systems Architecture: the EUROMICRO Journal
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Integrating coordinated checkpointing and recovery mechanisms into DSM synchronization barriers
Journal of Experimental Algorithmics (JEA)
Memory access behavior analysis of NUMA-based shared memory programs
Scientific Programming
Combinable memory-block transactions
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Parallel DNA sequence alignment using a DSM system in a cluster of workstations
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
Hi-index | 0.01 |
The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. The hardware overhead of directory-based cache coherence in a 48-processor is examined. The data show that the overhead is only about 10-15%, which appears to be a small cost for the ease of programming offered by coherent caches and the potential for higher performance. The performance of the system is discussed, and the speedups obtained by a variety of parallel applications running on the prototype are shown. Using a sophisticated hardware performance monitor, the effectiveness of coherent caches and the relationship between an application's reference behavior and its speedup are characterized. The optimizations incorporated in the DASH protocol are evaluated in terms of their effectiveness on parallel applications and on atomic tests that stress the memory system.