Evaluation of Hierarchical Mesh Reorderings

Authors:
Michelle Mills Strout;Nissa Osheim;Dave Rostron;Paul D. Hovland;Alex Pothen
Affiliations:
Colorado State University, Fort Collins, USA CO 80523;Colorado State University, Fort Collins, USA CO 80523;Colorado State University, Fort Collins, USA CO 80523;Argonne National Laboratory, Argonne, USA IL 60439;Purdue University, West Lafayette, USA IN 47907
Venue:
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Year:
2009

Citing 16
Cited 1

Sparse matrix computations: implications for cache designs

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Architecture-independent locality-improving transformations of computational graphs embedded in k-dimensions

ICS '95 Proceedings of the 9th international conference on Supercomputing
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication

IEEE Transactions on Parallel and Distributed Systems
Performance modeling and tuning of an unstructured mesh CFD application

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Evaluating the impact of memory system performance on software prefetching and locality optimizations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Computation regrouping: restructuring programs for temporal data cache locality

ICS '02 Proceedings of the 16th international conference on Supercomputing
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

SIAM Review
Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings

International Journal of Parallel Programming
A Comparison of Locality Transformations for Irregular Codes

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Performance optimizations and bounds for sparse matrix-vector multiply

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Localizing Non-Affine Array References

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Exploiting Locality in the Run-Time Parallelization of Irregular Loops

ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
Memory Hierarchy Management for Iterative Graph Structures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Metrics and models for reordering transformations

MSP '04 Proceedings of the 2004 workshop on Memory system performance

A new metric enabling an exact hypergraph model for the communication volume in distributed-memory parallel applications

Parallel Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Irregular and sparse scientific computing programs frequently experience performance losses due to inefficient use of the memory system in most machines. Previous work has shown that, for a graph model, performing a partitioning and then reordering within each partition improves performance. More recent work has shown that reordering heuristics based on a hypergraph model result in better reorderings than those based on a graph model. This paper studies the effects of hierarchical reordering strategies within the hypergraph model. In our experiments, the reorderings are applied to the nodes and elements of tetrahedral meshes, which are inputs to a mesh optimization application. We show that cache performance degrades over time with consecutive packing, but not with breadth-first ordering, and that hierarchical reorderings involving hypergraph partitioning followed by consecutive packing or breadth-first orderings in each partition improve overall execution time.