The benefits of clustering in shared address space multiprocessors: an applications-driven investigation

Authors:
Andrew Erlichson;Basem A. Nayfeh;Jaswinder P. Singh;Kunle Olukotun
Affiliations:
Computer Systems Lab, Standford University, Standford, CA;Computer Systems Lab, Standford University, Standford, CA;Department of Computer Science, Princeton University, Princeton, NJ;Computer Systems Lab, Standford University, Standford, CA
Venue:
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Year:
1995

Citing 9
Cited 5

High-bandwidth data memory systems for superscalar processors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
The DASH prototype: implementation and performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Volume rendering on scalable shared-memory MIMD architectures

VVS '92 Proceedings of the 1992 workshop on Volume visualization
Simulation of multiprocessors: accuracy and performance

Simulation of multiprocessors: accuracy and performance
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Exploring the design space for a shared-cache multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance Features of the PA7100 Microprocessor

IEEE Micro

SoftFLASH: analyzing the performance of clustered distributed virtual shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The effects of communication parameters on end performance of shared virtual memory clusters

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A Study of the Efficiency of Shared Attraction Memories in Cluster-Based COMA Multiprocessors

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
The impact of shared-cache clustering in small-scale shared-memory multiprocessors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Skewed caches from a low-power perspective

Proceedings of the 2nd conference on Computing frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering processors together at a level of the memory hierarchy in shared address space multiprocessors appears to be an attractive technique from several standpoints: Resources are shared, packaging technologies are exploited, and processors within a cluster can share data more effectively. We investigate the performance benefits that can be obtained by clustering on a range of important scientific and engineering applications in moderate to large scale cache coherent machines with small degrees of clustering (up to one eighth of the total number of processors in a cluster). We find that except for applications with near neighbor communication topologies this degree of clustering is not very effective in reducing the inherent communication to computation ratios. Clustering is more useful in reducing the the number of remote capacity misses in unstructured applications, and can improve performance substantially when small first-level caches are clustered in these cases. This suggests that clustering at the first level cache might be useful in highly-integrated, relatively fine-grained environments. For less integrated machines such as current distributed shared memory multiprocessors, our results suggest that clustering at the first-level caches is not very useful in improving application performance; however our results also suggest that in an machine with long interprocessor communication latencies, clustering further away from the processor can provide performance benefits.