Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors

Authors:
Sudarsan Tandri;Tarek S. Abdelrahman
Affiliations:
-;-
Venue:
ICPP '97 Proceedings of the international Conference on Parallel Processing
Year:
1997

Citing 12
Cited 19

Hector: A Hierarchically Structured Shared-Memory Multiprocessor

Computer - Special issue on experimental research in computer architecture
Exploiting operating system support for dynamic page placement on a NUMA shared memory multiprocessor

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
The high performance Fortran handbook

The high performance Fortran handbook
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic data layout for high performance Fortran

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A novel approach towards automatic data distribution

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Compiling Communication-Efficient Programs for Massively Parallel Machines

IEEE Transactions on Parallel and Distributed Systems
Automatic Data and Computation Partitioning on Scalable Shared Memory Multiprocessors

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Polaris: Improving the Effectiveness of Parallelizing Compilers

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Automatic computation and data partitioning on scalable shared-memory multiprocessors

Automatic computation and data partitioning on scalable shared-memory multiprocessors

Automatic data layout for distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts

IEEE Transactions on Parallel and Distributed Systems
A compiler technique for improving whole-program locality

POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Static and Dynamic Locality Optimizations Using Integer Linear Programming

IEEE Transactions on Parallel and Distributed Systems
Data Relation Vectors: A New Abstraction for Data Optimizations

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers

The Journal of Supercomputing
Compiler Support for Array Distribution onNUMA Shared Memory Multiprocessors

The Journal of Supercomputing
A Layout-Conscious Iteration Space Transformation Technique

IEEE Transactions on Computers
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Locality Enhancement for Large-Scale Shared-Memory Multiprocessors

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Overlap of computation and communication on shared-memory networks-of-workstations

Cluster computing
ARS: an adaptive runtime system for locality optimization

Future Generation Computer Systems - Tools for program development and analysis
Quasidynamic Layout Optimizations for Improving Data Locality

IEEE Transactions on Parallel and Distributed Systems
Improving whole-program locality using intra-procedural and inter-procedural transformations

Journal of Parallel and Distributed Computing
2D data locality: definition, abstraction, and application

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Memory access behavior analysis of NUMA-based shared memory programs

Scientific Programming
Application mapping for chip multiprocessors

Proceedings of the 45th annual Design Automation Conference
Integrated code and data placement in two-dimensional mesh based chip multiprocessors

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Matching memory access patterns and data placement for NUMA systems

Proceedings of the Tenth International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an algorithm for deriving data and computation partitions on scalable shared memory multiprocessors. The algorithm establishes affinity relationships between where computations are performed and where data is located based on array accesses in the program. The algorithm then uses these affinity relationships to determine both static and dynamic partitions for arrays and parallel loops. Experimental results from a prototype implementation of the algorithm demonstrate that it is computationally efficient and that it improves the parallel performance of standard benchmarks. The results also show the necessity of taking shared memory effects (memory contention, cache locality, false-sharing and synchronization) into account---partitions derived to minimize only interprocessor communications do not necessarily result in the best performance.