Sparse matrix computations: implications for cache designs
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
ICS '95 Proceedings of the 9th international conference on Supercomputing
Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication
IEEE Transactions on Parallel and Distributed Systems
Performance modeling and tuning of an unstructured mesh CFD application
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
ICS '01 Proceedings of the 15th international conference on Supercomputing
Computation regrouping: restructuring programs for temporal data cache locality
ICS '02 Proceedings of the 16th international conference on Supercomputing
International Journal of Parallel Programming
A Comparison of Locality Transformations for Irregular Codes
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Localizing Non-Affine Array References
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Exploiting Locality in the Run-Time Parallelization of Irregular Loops
ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
Memory Hierarchy Management for Iterative Graph Structures
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Metrics and models for reordering transformations
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Hi-index | 0.01 |
Irregular and sparse scientific computing programs frequently experience performance losses due to inefficient use of the memory system in most machines. Previous work has shown that, for a graph model, performing a partitioning and then reordering within each partition improves performance. More recent work has shown that reordering heuristics based on a hypergraph model result in better reorderings than those based on a graph model. This paper studies the effects of hierarchical reordering strategies within the hypergraph model. In our experiments, the reorderings are applied to the nodes and elements of tetrahedral meshes, which are inputs to a mesh optimization application. We show that cache performance degrades over time with consecutive packing, but not with breadth-first ordering, and that hierarchical reorderings involving hypergraph partitioning followed by consecutive packing or breadth-first orderings in each partition improve overall execution time.