Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simulation of multiprocessors: accuracy and performance
Simulation of multiprocessors: accuracy and performance
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache coherence directories for scalable multiprocessors
Cache coherence directories for scalable multiprocessors
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Integration of message passing and shared memory in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance advantages of integrating block data transfer in cache-coherent multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
Support for Speculative Execution in High-Performance Processors
Support for Speculative Execution in High-Performance Processors
SimOS: A Fast Operating System Simulation Environment
SimOS: A Fast Operating System Simulation Environment
Loading databases using dataflow parallelism
ACM SIGMOD Record
Integration of message passing and shared memory in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
Architectural mechanisms for explicit communication in shared memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines
ICS '95 Proceedings of the 9th international conference on Supercomputing
Efficient strategies for software-only protocols in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Application and architectural bottlenecks in large scale distributed shared memory machines
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Integrating performance monitoring and communication in parallel computers
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hardware fault containment in scalable shared-memory multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Digital system simulation: methodologies and examples
DAC '98 Proceedings of the 35th annual Design Automation Conference
In-memory directories: eliminating the cost of directories in CC-NUMAs
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
IEEE Transactions on Computers - Special issue on cache memory and related problems
Coherence Controller Architectures for Scalable Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
A fully associative software-managed cache design
Proceedings of the 27th annual international symposium on Computer architecture
FLASH vs. (simulated) FLASH: closing the simulation loop
ACM SIGPLAN Notices
Memory Hierarchy Considerations for Cost-Effective Cluster Computing
IEEE Transactions on Computers
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Optimizing software cache-coherent cluster architectures
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Leveraging cache coherence in active memory systems
ICS '02 Proceedings of the 16th international conference on Supercomputing
An Application-Driven Study of Multicast Communication for Write Invalidation
The Journal of Supercomputing
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
How Much Does Network Contention Affect Distributed Shared Memory Performance?
ICPP '97 Proceedings of the international Conference on Parallel Processing
Hardware Versus Software Implementation of COMA
ICPP '97 Proceedings of the international Conference on Parallel Processing
Coherent Block Data Transfer in the FLASH Multiprocessor
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Processor Mechanisms for Software Shared Memory
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Scalability in computing for today and tomorrow
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
A Hierarchical Memory Directory Scheme Via Extending SCI for Large-Scale Multiprocessors
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
The Thread-Based Protocol Engines for CC-NUMA Multiprocessors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation
IEEE Transactions on Computers
The Impact of Negative Acknowledgments in Shared Memory Scientific Applications
IEEE Transactions on Parallel and Distributed Systems
Spinach: a liberty-based simulator for programmable network interface architectures
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Exploring Virtual Network Selection Algorithms in DSM Cache Coherence Protocols
IEEE Transactions on Parallel and Distributed Systems
Journal of Systems Architecture: the EUROMICRO Journal
Programmable bus/memory controllers in modern computer architecture
Proceedings of the 43rd annual Southeast regional conference - Volume 1
A new trace-driven shared-memory multiprocessors machine simulator
International Journal of Computers and Applications
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Hi-index | 0.01 |
A flexible communication mechanism is a desirable feature in multiprocessors because it allows support for multiple communication protocols, expands performance monitoring capabilities, and leads to a simpler design and debug process. In the Stanford FLASH multiprocessor, flexibility is obtained by requiring all transactions in a node to pass through a programmable node controller, called MAGIC. In this paper, we evaluate the performance costs of flexibility by comparing the performance of FLASH to that of an idealized hardwired machine on representative parallel applications and a multiprogramming workload. To measure the performance of FLASH, we use a detailed simulator of the FLASH and MAGIC designs, together with the code sequences that implement the cache-coherence protocol. We find that for a range of optimized parallel applications the performance differences between the idealized machine and FLASH are small. For these programs, either the miss rates are small or the latency of the programmable protocol can be hidden behind the memory access time. For applications that incur a large number of remote misses or exhibit substantial hot-spotting, performance is poor for both machines, though the increased remote access latencies or the occupancy of MAGIC lead to lower performance for the flexible design. In most cases, however, FLASH is only 2%–12% slower than the idealized machine.