Using Time Skewing to Eliminate Idle Time due to Memory Bandwidth and Network Limitations

Authors:
Affiliations:
Venue:
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Year:
2000

Citing 0
Cited 19

Achieving Scalable Locality with Time Skewing

International Journal of Parallel Programming
Extending Scalar Optimizations for Arrays

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Improving effective bandwidth through compiler enhancement of global cache reuse

Journal of Parallel and Distributed Computing
The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
Implicit and explicit optimizations for stencil computations

Proceedings of the 2006 workshop on Memory system performance and correctness
Scientific computing Kernels on the cell processor

International Journal of Parallel Programming
Implementation of a wide-angle lens distortion correction algorithm on the cell broadband engine

Proceedings of the 23rd international conference on Supercomputing
A Multilevel Parallelization Framework for High-Order Stencil Computations

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Polynomial time array dataflow analysis

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Cache oblivious parallelograms in iterative stencil computations

Proceedings of the 24th ACM International Conference on Supercomputing
Exposing tunable parameters in multi-threaded numerical code

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Time skewing made simple

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Landing stencil code on Godson-T

Journal of Computer Science and Technology
Understanding stencil code performance on multicore architectures

Proceedings of the 8th ACM International Conference on Computing Frontiers
Locality optimizations for jacobi iteration on distributed parallel systems

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Efficient execution of time-step computations with pipelined parallelism and inter-thread data locality optimizaitions

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters

The Journal of Supercomputing
Tiling stencil computations to maximize parallelism

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Optimization of geometric multigrid for emerging multi- and manycore processors

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Time skewing is a compile-time optimization that can provide arbitrarily high cache hit rates for a class of iterative calculations, given a sufficient number of time steps and sufficient cache memory. Thus, it can eliminate processor idle time caused by inadequate main memory bandwidth.In this article, we give a generalization of time skewing for multiprocessor architectures, and discuss time skewing for multilevel caches. Our generalization for multiprocessors lets us eliminate processor idle time caused by any combination of inadequate main memory bandwidth, limited network bandwidth, and high network latency, given a sufficiently large problem and sufficient cache. As in the uniprocessor case, the cache requirement grows with the machine balance rather than the problem size. Our techniques for using multilevel caches reduce the L1 cache requirement, which would otherwise be unacceptably high for some architecture when using arrays of high dimension.