High-performance computer architecture
High-performance computer architecture
Stencils and problem partitionings: their influence on the performance of multiple processor systems
IEEE Transactions on Computers
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
On the problem of optimizing data transfers for complex memory systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors
IEEE Transactions on Computers
The influence of parallel decomposition strategies on the performance of multiprocessor systems
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
On program restructuring, scheduling, and communication for parallel processor systems
On program restructuring, scheduling, and communication for parallel processor systems
Beyond loop partitioning: data assignment and overlap to reduce communication overhead
ICS '91 Proceedings of the 5th international conference on Supercomputing
Automatic transformation of FORTRAN loops to reduce cache conflicts
ICS '91 Proceedings of the 5th international conference on Supercomputing
Fortran at ten gigaflops: the connection machine convolution compiler
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Seismic modeling at 14 gigaflops on the connection machine
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Access normalization: loop restructuring for NUMA compilers
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Model-driven mapping onto distributed memory parallel computers
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Access normalization: loop restructuring for NUMA computers
ACM Transactions on Computer Systems (TOCS)
PARADIGM: a compiler for automatic data distribution on multicomputers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Automatic data layout for high performance Fortran
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing
IEEE Transactions on Computers
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers
IEEE Transactions on Parallel and Distributed Systems
A General Data Layout for Distributed Consistency in Data Parallel Applications
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
An Adaptive Approach to Data Placement
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Fortran RED - A Retargetable Environment for Automatic Data Layout
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Automatic Data Layout Using 0-1 Integer Programming
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Hi-index | 0.00 |
This paper uses bottom-up, static program partitioning to minimize the execution time of parallel programs by reducing interprocessor communication. Program partitioning is applied to a parallel programming construct known as a sequentially iterated parallel loop. This paper develops and evaluates compiler techniques to automatically generate data partitions for sequentially iterated parallel loops that minimize interprocessor communication. These techniques could be included as a communication-reducing back end for existing parallelizing compilers.A multiprocessor model is defined to analyze program execution factors that lead to interprocessor communication. A model for the sequentially iterated parallel loop is defined to quantify communication patterns within a program. A vector notation is chosen to quantify communication across a global data set. Communication parameters used in generating data partitions are found and used within equations which are shown to generate rectangular and hexagonal partition shapes that minimize interprocessor communication.Experiments measuring the impact of communication on program performance and the improvements gained by using the automatically generated data partitions were conducted on a sixteen-processor Encore Multimax. The new method outperformed established data partitioning methods.