Interprocedural slicing using dependence graphs
ACM Transactions on Programming Languages and Systems (TOPLAS)
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Using Program Slicing in Software Maintenance
IEEE Transactions on Software Engineering
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Generating local addresses and communication sets for data-parallel programs
Journal of Parallel and Distributed Computing
Global communication analysis and optimization
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Iteration space slicing and its application to communication optimization
ICS '97 Proceedings of the 11th international conference on Supercomputing
Reuse-driven interprocedural slicing
Proceedings of the 20th international conference on Software engineering
Parallelizing DSP nested loops on reconfigurable architectures using data context switching
Proceedings of the 38th annual Design Automation Conference
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Combining dependence and data-flow analyses to optimize communication
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Resource-Based Communication Placement Analysis
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
ICSE '81 Proceedings of the 5th international conference on Software engineering
Reuse-Driven Interprocedural Slicing in the Presence of Pointers and Recursions
ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Automatic computation and data decomposition for multiprocessors
Automatic computation and data decomposition for multiprocessors
Compiler Techniques for the Distribution of Data and Computation
IEEE Transactions on Parallel and Distributed Systems
Automatic parallel code generation for tiled nested loops
Proceedings of the 2004 ACM symposium on Applied computing
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
International Journal of High Performance Computing Applications
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Proceedings of the International Symposium on Code Generation and Optimization
Compilation Techniques for Optimizing Communication on Distributed-Memory Systems
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 02
Experiences with enumeration of integer projections of parametric polytopes
CC'05 Proceedings of the 14th international conference on Compiler Construction
Hi-index | 0.00 |
One of the critical problems in distributed memory multi-core architectures is scalable parallelization that minimizes inter-processor communication. Using the concept of iteration space slicing, this paper presents a new code parallelization scheme for data-intensive applications. This scheme targets distributed memory multi-core architectures, and formulates the problem of data-computation distribution (partitioning) across parallel processors using slicing such that, starting with the partitioning of the output arrays, it iteratively determines the partitions of other arrays as well as iteration spaces of the loop nests in the application code. The goal is to minimize inter-processor data communications. Based on this iteration space slicing based formulation of the problem, we also propose a solution scheme. The proposed data-computation scheme is evaluated using six data-intensive benchmark programs. In our experimental evaluation, we also compare this scheme against three alternate data-computation distribution schemes. The results obtained are very encouraging, indicating around 10% better speedup, with 16 processors, over the next-best scheme when averaged over all benchmark codes we tested.