An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors

Authors:
Guilin Chen;Mahmut Kandemir
Affiliations:
The Pennsylvania State University, USA;The Pennsylvania State University, USA
Venue:
Transactions on High-Performance Embedded Architectures and Compilers I
Year:
2007

Citing 17
Cited 1

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
A theory of loop permutations

Selected papers of the second workshop on Languages and compilers for parallel computing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Fortran at ten gigaflops: the connection machine convolution compiler

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Exploring the design space for a shared-cache multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Automatic optimization of communication in compiling out-of-core stencil codes

ICS '96 Proceedings of the 10th international conference on Supercomputing
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Compiling stencils in high performance Fortran

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
A Single-Chip Multiprocessor

Computer
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
A Singular Loop Transformation Framework Based on Non-Singular Matrices

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Exact solution of linear equations

SYMSAC '71 Proceedings of the second ACM symposium on Symbolic and algebraic manipulation
Transient-fault recovery for chip multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
The future of multiprocessor systems-on-chips

Proceedings of the 41st annual Design Automation Conference
The Future of Microprocessors

Queue - Multiprocessors

Energy- and Performance-Efficient Communication Framework for Embedded MPSoCs through Application-Driven Release Consistency

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The tighter integration on chip multiprocessors exerts a higher pressure on off-chip accesses to the memory system. This makes minimizing the number of off-chip accesses a critical optimization goal. This paper discusses a compiler-based solution to this problem for the embedded applications that perform stencil computations. An important characteristic of this solution is that it distinguishes between the intra-processor data reuse and inter-processor data reuse. The first of these captures the data reuse that occurs across loop iterations assigned to the same processor, whereas the second one represents the data reuse that takes place across the loop iterations assigned to different processors. The proposed approach then optimizes inter-processor reuse by re-organizing the loop iterations of each processor carefully, considering how data elements are shared across processors. The goal is to ensure that the different processors access the shared data within a short period of time, so that the data can be captured in the on-chip memory space at the time of the reuse. This paper also presents an evaluation of the proposed optimization and compares it to an alternate scheme that optimizes data locality for each processor in isolation. The results obtained by applying our implementation to eight loop-intensive benchmark codes from the embedded computing domain show that our approach improves over the mentioned alternate scheme by 15.6% on average.