Simulating computer systems: techniques and tools
Simulating computer systems: techniques and tools
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
IEEE Transactions on Computers
Multicomputer networks: message-based parallel processing
Multicomputer networks: message-based parallel processing
Performance analysis of hierarchical cache-consistent multiprocessors
Performance Evaluation - Selected papers from the international seminar on performance of distributed and parallel systems
Hector: A Hierarchically Structured Shared-Memory Multiprocessor
Computer - Special issue on experimental research in computer architecture
Synchronization without contention
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparison of hardware and software cache coherence schemes
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Ultracomputers: a teraflop before its time
Communications of the ACM
IEEE Spectrum
A performance study of memory consistency models
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improved multithreading techniques for hiding communication latency in multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
An effective synchronization network for hot-spot accesses
ACM Transactions on Computer Systems (TOCS)
Cache consistency in hierarchical-ring-based multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The impact of synchronization and granularity on parallel systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
Performance Tradeoffs in Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Predicting application behavior in large scale shared-memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Optimal Clustering of Hierarchical Hyper-Ring Multicomputers
The Journal of Supercomputing
Performance of the hyper-ring multicomputer
SAC '98 Proceedings of the 1998 ACM symposium on Applied Computing
Hierarchical Ring Network Configuration and Performance Modeling
IEEE Transactions on Computers
Performance Modeling of Hierarchical Crossbar-Based Multicomputer Systems
IEEE Transactions on Computers
Performance and Configuration of Hierarchical Ring Networks for Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Bidirectional versus Unidirectional Networks: Cost/Performance Trade-Offs
MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 14.99 |
Investigates the performance of word-packet, slotted unidirectional ring-based hierarchical direct networks in the context of large-scale shared memory multiprocessors. Slotted unidirectional rings are attractive because their electrical characteristics and simple interfaces allow for fast cycle times and large bandwidths. For large-scale systems, it is necessary to use multiple rings for increased aggregate bandwidth. Hierarchies are attractive because the topology ensures unique paths between nodes, simple node interfaces and simple inter-ring connections. To ensure that a realistic region of the design space is examined, the architecture of the network used in the Hector prototype is adopted as the initial design point. A simulator of that architecture has been developed and validated with measurements from the prototype. The system and workload parameterization reflects conditions expected in the near future. The results of this study shows the importance of system balance on performance.