Enhancing locality for recursive traversals of recursive structures

Authors:
Youngjoon Jo;Milind Kulkarni
Affiliations:
Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA
Venue:
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Year:
2011

Citing 27
Cited 4

Vectorization of tree traversals

Journal of Computational Physics
Vectorization of a treecode

Journal of Computational Physics
Parallel hierarchical N-body methods

Parallel hierarchical N-body methods
Load balancing and data locality in adaptive hierarchical N-body methods: Barnes-Hut, fast multipole, and radiosity

Journal of Parallel and Distributed Computing
Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Rendering complex scenes with memory-coherent ray tracing

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Commutativity analysis: a new analysis technique for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Using generational garbage collection to implement cache-conscious data placement

Proceedings of the 1st international symposium on Memory management
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Parametric shape analysis via 3-valued logic

ACM Transactions on Programming Languages and Systems (TOPLAS)
Distribution-Independent Hierarchical Algorithmsfor the N-body Problem

The Journal of Supercomputing
Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings

International Journal of Parallel Programming
Parallelizing Programs with Recursive Data Structures

IEEE Transactions on Parallel and Distributed Systems
A Data Parallel Formulation of the Barnes-Hut Method for N -Body Simulations

PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
Improving Cache Behavior of Dynamically Allocated Data Structures

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Automatic pool allocation: improving performance by controlling data structure layout in the heap

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Lightcuts: a scalable approach to illumination

ACM SIGGRAPH 2005 Papers
Exploiting Locality for Irregular Scientific Codes

IEEE Transactions on Parallel and Distributed Systems
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
The jastadd extensible java compiler

Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
On fast Construction of SAH-based Bounding Volume Hierarchies

RT '07 Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing
Deep Coherent Ray Tracing

RT '07 Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing
A scalable auto-tuning framework for compiler optimization

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Automatically enhancing locality for tree traversals with traversal splicing

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
General transformations for GPU execution of tree traversals

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Automatic vectorization of tree traversals

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

While there has been decades of work on developing automatic, locality-enhancing transformations for regular programs that operate over dense matrices and arrays, there has been little investigation of such transformations for irregular programs, which operate over pointer-based data structures such as graphs, trees and lists. In this paper, we argue that, for a class of irregular applications we call traversal codes, there exists substantial data reuse and hence opportunity for locality exploitation. We develop a novel optimization called point blocking, inspired by the classic tiling loop transformation, and show that it can substantially enhance temporal locality in traversal codes. We then present a transformation and optimization framework called TreeTiler that automatically detects opportunities for applying point blocking and applies the transformation. TreeTiler uses autotuning techniques to determine appropriate parameters for the transformation. For a series of traversal algorithms drawn from real-world applications, we show that TreeTiler is able to deliver performance improvements of up to 245% over an optimized (but non-transformed) parallel baseline, and in several cases, significantly better scalability.