Slicing based code parallelization for minimizing inter-processor communication

Authors:
Mahmut Kandemir;Yuanrui Zhang;Sai Prasanth Muralidhara;Ozcan Ozturk;Sri Hari Krishna Narayanan
Affiliations:
The Pennsylvania State University, State College, PA, USA;The Pennsylvania State University, State College, PA, USA;The Pennsylvania State University, State College, PA, USA;Bilkent University, Ankara, Turkey;The Pennsylvania State University, State College, PA, USA
Venue:
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Year:
2009

Citing 28
Cited 0

Interprocedural slicing using dependence graphs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic program slicing

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Using Program Slicing in Software Maintenance

IEEE Transactions on Software Engineering
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Generating local addresses and communication sets for data-parallel programs

Journal of Parallel and Distributed Computing
Global communication analysis and optimization

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Iteration space slicing and its application to communication optimization

ICS '97 Proceedings of the 11th international conference on Supercomputing
Reuse-driven interprocedural slicing

Proceedings of the 20th international conference on Software engineering
Parallelizing DSP nested loops on reconfigurable architectures using data context switching

Proceedings of the 38th annual Design Automation Conference
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Combining dependence and data-flow analyses to optimize communication

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Resource-Based Communication Placement Analysis

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
Program slicing

ICSE '81 Proceedings of the 5th international conference on Software engineering
Reuse-Driven Interprocedural Slicing in the Presence of Pointers and Recursions

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Automatic computation and data decomposition for multiprocessors

Automatic computation and data decomposition for multiprocessors
Compiler Techniques for the Distribution of Data and Computation

IEEE Transactions on Parallel and Distributed Systems
Automatic parallel code generation for tiled nested loops

Proceedings of the 2004 ACM symposium on Applied computing
Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications

International Journal of High Performance Computing Applications
Transformations to Parallel Codes for Communication-Computation Overlap

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
Compilation Techniques for Optimizing Communication on Distributed-Memory Systems

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 02
Experiences with enumeration of integer projections of parametric polytopes

CC'05 Proceedings of the 14th international conference on Compiler Construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the critical problems in distributed memory multi-core architectures is scalable parallelization that minimizes inter-processor communication. Using the concept of iteration space slicing, this paper presents a new code parallelization scheme for data-intensive applications. This scheme targets distributed memory multi-core architectures, and formulates the problem of data-computation distribution (partitioning) across parallel processors using slicing such that, starting with the partitioning of the output arrays, it iteratively determines the partitions of other arrays as well as iteration spaces of the loop nests in the application code. The goal is to minimize inter-processor data communications. Based on this iteration space slicing based formulation of the problem, we also propose a solution scheme. The proposed data-computation scheme is evaluated using six data-intensive benchmark programs. In our experimental evaluation, we also compare this scheme against three alternate data-computation distribution schemes. The results obtained are very encouraging, indicating around 10% better speedup, with 16 processors, over the next-best scheme when averaged over all benchmark codes we tested.