Achieving Scalable Locality with Time Skewing
International Journal of Parallel Programming
Extending Scalar Optimizations for Arrays
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
Implicit and explicit optimizations for stencil computations
Proceedings of the 2006 workshop on Memory system performance and correctness
Scientific computing Kernels on the cell processor
International Journal of Parallel Programming
Implementation of a wide-angle lens distortion correction algorithm on the cell broadband engine
Proceedings of the 23rd international conference on Supercomputing
A Multilevel Parallelization Framework for High-Order Stencil Computations
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Polynomial time array dataflow analysis
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Cache oblivious parallelograms in iterative stencil computations
Proceedings of the 24th ACM International Conference on Supercomputing
Exposing tunable parameters in multi-threaded numerical code
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Landing stencil code on Godson-T
Journal of Computer Science and Technology
Understanding stencil code performance on multicore architectures
Proceedings of the 8th ACM International Conference on Computing Frontiers
Locality optimizations for jacobi iteration on distributed parallel systems
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
The Journal of Supercomputing
Tiling stencil computations to maximize parallelism
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Optimization of geometric multigrid for emerging multi- and manycore processors
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Time skewing is a compile-time optimization that can provide arbitrarily high cache hit rates for a class of iterative calculations, given a sufficient number of time steps and sufficient cache memory. Thus, it can eliminate processor idle time caused by inadequate main memory bandwidth.In this article, we give a generalization of time skewing for multiprocessor architectures, and discuss time skewing for multilevel caches. Our generalization for multiprocessors lets us eliminate processor idle time caused by any combination of inadequate main memory bandwidth, limited network bandwidth, and high network latency, given a sufficiently large problem and sufficient cache. As in the uniprocessor case, the cache requirement grows with the machine balance rather than the problem size. Our techniques for using multilevel caches reduce the L1 cache requirement, which would otherwise be unacceptably high for some architecture when using arrays of high dimension.