Distributed Shared Memory and Compiler-Induced Scalable Locality for Scalable Cluster Performance

  • Authors:
  • Mohamed Abdalkader;Ian Burnette;Tim Douglas;David G. Wonnacott

  • Affiliations:
  • -;-;-;-

  • Venue:
  • CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed shared memory software allows a cluster to function as a single collection of many processing cores with a large physical memory, but highly unusual performance parameters: communication latency and bandwidth between nodes may be several orders of magnitude worse than on-chip. Thus, effective use of such systems requires computation/communication ratios many times higher. The loop optimization known as "time skewing" or "time tiling" can, for some codes, produce arbitrarily high compute balance. It should thus allow scalable high performance regardless of memory and network bandwidth limitations. We have been exploring the scalability of time tiling on homogeneous dedicated clusters, considering the effects of scaling both the number of nodes in the cluster and the ratio of computation speed to network bandwidth. Even with simple 1- and 2-d Jacobi stencil computations, there are challenges to practical realization of the prediction of scalability.