POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
A data locality optimizing algorithm
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture
HPCS '06 Proceedings of the 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment
Effective automatic parallelization of stencil computations
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Phasers: a unified deadlock-free construct for collective and point-to-point synchronization
Proceedings of the 22nd annual international conference on Supercomputing
Mapping the LU decomposition on a many-core architecture: challenges and solutions
Proceedings of the 6th ACM conference on Computing frontiers
Mapping the FDTD Application to Many-Core Chip Architectures
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Optimized dense matrix multiplication on a many-core architecture
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Computers & Mathematics with Applications
Hi-index | 0.00 |
This paper proposes tiling techniques based on data dependencies and not in code structure. The work presented here leverages and expands previous work by the authors in the domain of non traditional tiling for parallel applications. The main contributions of this paper are: (1) A formal description of tiling from the point of view of the data produced and not from the source code. (2) A mathematical proof for an optimum tiling in terms of maximum reuse for stencil applications, addressing the disparity between computation power and memory bandwidth for many-core architectures. (3) A description and implementation of our tiling technique for well known stencil applications. (4) Experimental evidence that confirms the effectiveness of the tiling proposed to alleviate the disparity between computation power and memory bandwidth for many-core architectures. Our experiments, performed using one of the first Cyclops-64 many-core chips produced, confirm the effectiveness of our approach to reduce the total number of memory operations of stencil applications as well as the running time of the application.