Quantitative system performance: computer system analysis using queueing network models
Quantitative system performance: computer system analysis using queueing network models
Performance models of multiprocessor systems
Performance models of multiprocessor systems
Hierarchical cache/bus architecture for shared memory multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A mean-value performance analysis of a new multiprocessor architecture
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The effect of sharing on the cache and bus performance of parallel programs
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Theory, Volume 1, Queueing Systems
Theory, Volume 1, Queueing Systems
An analysis of dynamic page placement on a NUMA multiprocessor
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
An analytical model of high performance superscalar-based multiprocessors
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A performance evaluation of cluster architectures
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
AMVA techniques for high service time variability
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Evaluation of NUMA Memory Management Through Modeling and Measurements
IEEE Transactions on Parallel and Distributed Systems
SmartApps: An Application Centric Approach to High Performance Computing
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
A design methodology to develop efficient fork-join structures
ISCC '97 Proceedings of the 2nd IEEE Symposium on Computers and Communications (ISCC '97)
The Journal of Supercomputing
Hi-index | 0.00 |
Scalable shared-memory multiprocessors are the subject of much current research, but little is known about the performance behavior of these machines. This paper studies the performance effects of two machine characteristics and two program characteristics that seem to be major factors in determining the performance of a hierarchical shared-memory machine. We develop an analytical model of the traffic in a machine loosely based on Stanford's DASH multiprocessor and use program parameters extracted from multiprocessor traces to study its performance. It is shown that both locality in the data reference stream and the amount of data sharing in a program have an important impact on performance. Although less obvious, the bandwidth within each cluster in the hierarchy also has a significant performance effect. Optimizations that improve the intracluster cache coherence protocol or increase the bandwidth within a cluster can be quite effective.