Memory Hierarchy Management for Iterative Graph Structures

Authors:
Affiliations:
Venue:
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Year:
1998

Citing 6
Cited 18

Dynamic Remapping of Parallel Computations with Varying Resource Demands

IEEE Transactions on Computers
Automatic partitioning of unstructured grids into connected components

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Memory-hierarchy management

Memory-hierarchy management
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Optimization of particle-in-cell codes on reduced instruction set computer processors

Computers in Physics
Fast and parallel mapping algorithms for irregular problems

The Journal of Supercomputing

Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving memory hierarchy performance for irregular applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler and Run-Time Support for Exploiting Regularity within Irregular Applications

IEEE Transactions on Parallel and Distributed Systems
Evaluating the impact of memory system performance on software prefetching and locality optimizations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings

International Journal of Parallel Programming
A Comparison of Parallelization Techniques for Irregular Reductions

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Improving Locality for Adaptive Irregular Scientific Codes

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
A Blocked All-Pairs Shortest-Path Algorithm

SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
A Comparison of Locality Transformations for Irregular Codes

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compile-time composition of run-time data and iteration reorderings

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications

IEEE Transactions on Computers
Quasidynamic Layout Optimizations for Improving Data Locality

IEEE Transactions on Parallel and Distributed Systems
Metrics and models for reordering transformations

MSP '04 Proceedings of the 2004 workshop on Memory system performance
Parallel techniques in irregular codes: cloth simulation as case of study

Journal of Parallel and Distributed Computing
Exploiting Locality for Irregular Scientific Codes

IEEE Transactions on Parallel and Distributed Systems
An analytical model of locality-based parallel irregular reductions

Parallel Computing
Evaluation of Hierarchical Mesh Reorderings

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Social based layouts for the increase of locality in graph operations

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing gap in processor arid memory speeds has forced microprocessors to rely on deep cache hierarchies to keep the processors from starving for data. For many applications, this results in a wide disparity between sustained and peak achievable speed. Applications need to be tuned to processor arid memory system architectures for cache locality, memory layout and data prefetch and reuse.In this paper we investigate optimizations for unstructured iterative applications in which the computational structure remains static or changes only slightly through iterations. Our methods reorganize the data elements to obtain better memory system performance without modifying code fragments.Our experimental results show that the overall time can be reduced significantly using our optimizations. Further, the overhead of our methods is small enough that they are applicable even if the computational structure does not substantially change for tens of iterations.