Code scheduling for optimizing parallelism and data locality

Authors:
Taylan Yemliha;Mahmut Kandemir;Ozcan Ozturk;Emre Kultursay;Sai Prashanth Muralidhara
Affiliations:
Syracuse University;Pennsylvania State University;Bilkent University;Pennsylvania State University;Pennsylvania State University
Venue:
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Year:
2010

Citing 16
Cited 0

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiling for numa parallel machines

Compiling for numa parallel machines
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A compiler algorithm for optimizing locality in loop nests

ICS '97 Proceedings of the 11th international conference on Supercomputing
TGFF: task graphs for free

Proceedings of the 6th international workshop on Hardware/software codesign
An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
Nonsingular Data Transformations: Definition, Validity, and Applications

International Journal of Parallel Programming
An energy saving strategy based on adaptive loop parallelization

Proceedings of the 39th annual Design Automation Conference
Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
A GSA-based compiler infrastructure to extract parallelism from complex loops

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
An Approach to Parallelizing Non-Uniform Loops with the Omega Calculator

PARELEC '02 Proceedings of the International Conference on Parallel Computing in Electrical Engineering
Integrating loop and data optimizations for locality within a constraint network based framework

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design

Quantified Score

Hi-index	0.00

Visualization

Abstract

As chip multiprocessors proliferate, programming support for these devices is likely to receive a lot of attention in the near future. Parallelism and data locality are two critical issues in a chip multiprocessor environment. Unfortunately, most of the published work in the literature focuses only on one of these problems, and this can prevent one from achieving the best possible performance. The main goal of this paper is to propose and evaluate a compiler-directed code parallelization scheme, which considers both parallelism and data locality at the same time. Our compiler captures the inherent parallelism and data reuse in the application code being analyzed using a novel representation called the locality-parallelism graph (LPG). Our partitioning/scheduling algorithm assigns the nodes of this graph to the processors in the architecture and schedules them for execution. We implemented this algorithm and evaluated its effectiveness using a set of benchmark codes. The results collected so far indicate that our approach improves overall execution latency significantly. In this paper, we also introduce an ILP (Integer Linear Programming) based formulation of the problem, and implement the schedule obtained by the ILP solver. The results indicate that our approach gets within 4% of the ILP solution.