Locality optimization of stencil applications using data dependency graphs

Authors:
Daniel Orozco;Elkin Garcia;Guang Gao
Affiliations:
University of Delaware, Electrical and Computer Engineering Department;University of Delaware, Electrical and Computer Engineering Department;University of Delaware, Electrical and Computer Engineering Department
Venue:
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Year:
2010

Citing 11
Cited 1

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
A data locality optimizing algorithm

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture

HPCS '06 Proceedings of the 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment
Effective automatic parallelization of stencil computations

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Phasers: a unified deadlock-free construct for collective and point-to-point synchronization

Proceedings of the 22nd annual international conference on Supercomputing
Mapping the LU decomposition on a many-core architecture: challenges and solutions

Proceedings of the 6th ACM conference on Computing frontiers
Mapping the FDTD Application to Many-Core Chip Architectures

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Optimized dense matrix multiplication on a many-core architecture

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II

A dynamic rescheduling algorithm for resource management in large scale dependable distributed systems

Computers & Mathematics with Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes tiling techniques based on data dependencies and not in code structure. The work presented here leverages and expands previous work by the authors in the domain of non traditional tiling for parallel applications. The main contributions of this paper are: (1) A formal description of tiling from the point of view of the data produced and not from the source code. (2) A mathematical proof for an optimum tiling in terms of maximum reuse for stencil applications, addressing the disparity between computation power and memory bandwidth for many-core architectures. (3) A description and implementation of our tiling technique for well known stencil applications. (4) Experimental evidence that confirms the effectiveness of the tiling proposed to alleviate the disparity between computation power and memory bandwidth for many-core architectures. Our experiments, performed using one of the first Cyclops-64 many-core chips produced, confirm the effectiveness of our approach to reduce the total number of memory operations of stencil applications as well as the running time of the application.