Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Interprocedural partial redundancy elimination and its application to distributed memory compilation
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Software—Practice & Experience
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing communication in HPF programs on fine-grain distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
ICS '98 Proceedings of the 12th international conference on Supercomputing
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving memory hierarchy performance for irregular applications
ICS '99 Proceedings of the 13th international conference on Supercomputing
Adaptive reduction parallelization techniques
Proceedings of the 14th international conference on Supercomputing
Proceedings of the 14th international conference on Supercomputing
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
Improving Locality for Adaptive Irregular Scientific Codes
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
A Comparison of Locality Transformations for Irregular Codes
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compile-time composition of run-time data and iteration reorderings
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Improving Compiler and Run-Time Support for Adaptive Irregular Codes
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Localizing Non-Affine Array References
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Memory Hierarchy Management for Iterative Graph Structures
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Data partitioning-based parallel irregular reductions: Research Articles
Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
Metrics and models for reordering transformations
MSP '04 Proceedings of the 2004 workshop on Memory system performance
An operation stacking framework for large ensemble computations
Proceedings of the 21st annual international conference on Supercomputing
An analytical model of locality-based parallel irregular reductions
Parallel Computing
A component model of spatial locality
Proceedings of the 2009 international symposium on Memory management
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the 24th ACM International Conference on Supercomputing
Adjacency-based data reordering algorithm for acceleration of finite element computations
Scientific Programming
Region-based parallelization of irregular reductions on explicitly managed memory hierarchies
The Journal of Supercomputing
Social based layouts for the increase of locality in graph operations
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Enhancing locality for recursive traversals of recursive structures
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Code generation for parallel execution of a class of irregular loops on distributed memory systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Memory-access optimization of parallel molecular dynamics simulation via dynamic data reordering
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Non-affine Extensions to Polyhedral Code Generation
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Irregular scientific codes experience poor cache performance due to their irregular memory access patterns. In this paper, we present two new locality improving techniques for irregular scientific codes. Our techniques exploit geometric structures hidden in data access patterns and computation structures. Our new data reordering (Gpart) finds the graph structure within data accesses and applies hierarchical clustering. Quality partitions are constructed quickly by clustering multiple neighbor nodes with priority on nodes with high degree and repeating a few passes. Overhead is kept low by clustering multiple nodes in each pass and considering only edges between partitions. Our new computation reordering (Z-Sort) treats the values of index arrays as coordinates and reorders corresponding computations in Z-curve order. Applied to dense inputs, Z-Sort achieves performance close to data reordering combined with other computation reordering but without the overhead involved in data reordering. Experiments on irregular scientific codes for a variety of meshes show locality optimization techniques are effective for both sequential and parallelized codes, improving performance by 60-87 percent. Gpart achieved within 1-2 percent of the performance of more sophisticated partitioning algorithms, but with one third of the overhead. Z-Sort also yields the performance improvement of 64 percent for dense inputs, which is comparable with data reordering combined with computation reordering.