Quantitative system performance: computer system analysis using queueing network models
Quantitative system performance: computer system analysis using queueing network models
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Efficient synchronization primitives for large-scale cache-coherent multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Analysis of critical architectural and programming parameters in a hierarchical
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Evaluating Design Choices for Shared Bus Multiprocessors in a Throughput-Oriented Environment
IEEE Transactions on Computers
An analysis of dynamic page placement on a NUMA multiprocessor
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
An analytical model of high performance superscalar-based multiprocessors
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A Subsystem-Oriented Performance Analysis Methodology for Shared-Bus Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
LoPC: modeling contention in parallel algorithms
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
AMVA techniques for high service time variability
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Evaluation of NUMA Memory Management Through Modeling and Measurements
IEEE Transactions on Parallel and Distributed Systems
Mean Value Analysis: a Personal Account
Performance Evaluation: Origins and Directions
Hi-index | 0.00 |
This paper presents a preliminary performance analysis of a new large-scale multiprocessor: the Wisconsin Multicube. A key characteristic of the machine is that it is based on shared buses and a snooping cache coherence protocol. The organization of the shared buses and shared memory is unique and non-hierarchical. The two-dimensional version of the architecture is envisioned as scaling to 1024 processors.We develop an approximate mean-value analysis of bus interference for the proposed cache coherence protocol. The model includes FCFS scheduling at the bus queues with deterministic bus access times, and asynchronous memory write-backs and invalidation requests.We use our model to investigate the feasibility of the multiprocessor, and to study some initial system design issues. Our results indicate that a 1024-processor system can operate at 75 - 95% of its peak processing power, if the mean time between cache misses is larger than 1000 bus cycles (i.e. 50 microseconds for 20 MHz buses; 25 microseconds for 40 MHz buses). This miss rate is not unreasonable for the cache sizes specified in the design, which are comparable to main memory sizes in existing multiprocessors. We also present results which address the issues of optimal cache block size, optimal size of the two-dimensional Multicube, the effect of broadcast invalidations on system performance, and the viability of several hardware techniques for reducing the latency for remote memory requests.