POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Achieving Scalable Locality with Time Skewing
International Journal of Parallel Programming
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Automatic code generation for distributed memory architectures in the polytope model
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Distributed shared memory software allows a cluster to function as a single collection of many processing cores with a large physical memory, but highly unusual performance parameters: communication latency and bandwidth between nodes may be several orders of magnitude worse than on-chip. Thus, effective use of such systems requires computation/communication ratios many times higher. The loop optimization known as "time skewing" or "time tiling" can, for some codes, produce arbitrarily high compute balance. It should thus allow scalable high performance regardless of memory and network bandwidth limitations. We have been exploring the scalability of time tiling on homogeneous dedicated clusters, considering the effects of scaling both the number of nodes in the cluster and the ratio of computation speed to network bandwidth. Even with simple 1- and 2-d Jacobi stencil computations, there are challenges to practical realization of the prediction of scalability.