Dynamic Remapping of Parallel Computations with Varying Resource Demands
IEEE Transactions on Computers
Automatic partitioning of unstructured grids into connected components
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Memory-hierarchy management
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Fast and parallel mapping algorithms for irregular problems
The Journal of Supercomputing
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving memory hierarchy performance for irregular applications
ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler and Run-Time Support for Exploiting Regularity within Irregular Applications
IEEE Transactions on Parallel and Distributed Systems
ICS '01 Proceedings of the 15th international conference on Supercomputing
International Journal of Parallel Programming
A Comparison of Parallelization Techniques for Irregular Reductions
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Improving Locality for Adaptive Irregular Scientific Codes
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
A Blocked All-Pairs Shortest-Path Algorithm
SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
A Comparison of Locality Transformations for Irregular Codes
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compile-time composition of run-time data and iteration reorderings
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications
IEEE Transactions on Computers
Quasidynamic Layout Optimizations for Improving Data Locality
IEEE Transactions on Parallel and Distributed Systems
Metrics and models for reordering transformations
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Parallel techniques in irregular codes: cloth simulation as case of study
Journal of Parallel and Distributed Computing
Exploiting Locality for Irregular Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
An analytical model of locality-based parallel irregular reductions
Parallel Computing
Evaluation of Hierarchical Mesh Reorderings
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Social based layouts for the increase of locality in graph operations
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Hi-index | 0.00 |
The increasing gap in processor arid memory speeds has forced microprocessors to rely on deep cache hierarchies to keep the processors from starving for data. For many applications, this results in a wide disparity between sustained and peak achievable speed. Applications need to be tuned to processor arid memory system architectures for cache locality, memory layout and data prefetch and reuse.In this paper we investigate optimizations for unstructured iterative applications in which the computational structure remains static or changes only slightly through iterations. Our methods reorganize the data elements to obtain better memory system performance without modifying code fragments.Our experimental results show that the overall time can be reduced significantly using our optimizations. Further, the overhead of our methods is small enough that they are applicable even if the computational structure does not substantially change for tens of iterations.