The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiling for numa parallel machines
Compiling for numa parallel machines
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A compiler algorithm for optimizing locality in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Proceedings of the 6th international workshop on Hardware/software codesign
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Nonsingular Data Transformations: Definition, Validity, and Applications
International Journal of Parallel Programming
An energy saving strategy based on adaptive loop parallelization
Proceedings of the 39th annual Design Automation Conference
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
A GSA-based compiler infrastructure to extract parallelism from complex loops
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
An Approach to Parallelizing Non-Uniform Loops with the Omega Calculator
PARELEC '02 Proceedings of the International Conference on Parallel Computing in Electrical Engineering
Integrating loop and data optimizations for locality within a constraint network based framework
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Hi-index | 0.00 |
As chip multiprocessors proliferate, programming support for these devices is likely to receive a lot of attention in the near future. Parallelism and data locality are two critical issues in a chip multiprocessor environment. Unfortunately, most of the published work in the literature focuses only on one of these problems, and this can prevent one from achieving the best possible performance. The main goal of this paper is to propose and evaluate a compiler-directed code parallelization scheme, which considers both parallelism and data locality at the same time. Our compiler captures the inherent parallelism and data reuse in the application code being analyzed using a novel representation called the locality-parallelism graph (LPG). Our partitioning/scheduling algorithm assigns the nodes of this graph to the processors in the architecture and schedules them for execution. We implemented this algorithm and evaluated its effectiveness using a set of benchmark codes. The results collected so far indicate that our approach improves overall execution latency significantly. In this paper, we also introduce an ILP (Integer Linear Programming) based formulation of the problem, and implement the schedule obtained by the ILP solver. The results indicate that our approach gets within 4% of the ILP solution.