Distributed Shared Memory and Compiler-Induced Scalable Locality for Scalable Cluster Performance

Authors:
Mohamed Abdalkader;Ian Burnette;Tim Douglas;David G. Wonnacott
Affiliations:
-;-;-;-
Venue:
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Year:
2012

Citing 4
Cited 0

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Achieving Scalable Locality with Time Skewing

International Journal of Parallel Programming
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Automatic code generation for distributed memory architectures in the polytope model

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed shared memory software allows a cluster to function as a single collection of many processing cores with a large physical memory, but highly unusual performance parameters: communication latency and bandwidth between nodes may be several orders of magnitude worse than on-chip. Thus, effective use of such systems requires computation/communication ratios many times higher. The loop optimization known as "time skewing" or "time tiling" can, for some codes, produce arbitrarily high compute balance. It should thus allow scalable high performance regardless of memory and network bandwidth limitations. We have been exploring the scalability of time tiling on homogeneous dedicated clusters, considering the effects of scaling both the number of nodes in the cluster and the ratio of computation speed to network bandwidth. Even with simple 1- and 2-d Jacobi stencil computations, there are challenges to practical realization of the prediction of scalability.