Data distribution support on distributed shared memory multiprocessors

Authors:
Rohit Chandra;Ding-Kai Chen;Robert Cox;Dror E. Maydan;Nenad Nedeljkovic;Jennifer M. Anderson
Affiliations:
Silicon Graphics Computer Systems, Mountain View, CA;Silicon Graphics Computer Systems, Mountain View, CA;Silicon Graphics Computer Systems, Mountain View, CA;Silicon Graphics Computer Systems, Mountain View, CA;Silicon Graphics Computer Systems, Mountain View, CA;Digital Western Research Lab, Palo Alto, CA
Venue:
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Year:
1997

Citing 12
Cited 32

Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The high performance Fortran handbook

The high performance Fortran handbook
The design and evolution of C++

The design and evolution of C++
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic data layout for high performance Fortran

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
High Performance Fortran

IEEE Parallel & Distributed Technology: Systems & Technology
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Efficient Distribution Analysis via Graph Contraction

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
An Overview of the Fortran D Programming System

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing

A hyperplane based approach for optimizing spatial locality in loop nests

ICS '98 Proceedings of the 12th international conference on Supercomputing
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts

IEEE Transactions on Parallel and Distributed Systems
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Data Locality Exploitation in the Decomposition of Regular Domain Problems

IEEE Transactions on Parallel and Distributed Systems
A compiler technique for improving whole-program locality

POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Is data distribution necessary in OpenMP?

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
The trade-off between implicit and explicit data distribution in shared-memory programming paradigms

ICS '01 Proceedings of the 15th international conference on Supercomputing
Static and Dynamic Locality Optimizations Using Integer Linear Programming

IEEE Transactions on Parallel and Distributed Systems
Compiler-Directed Collective-I/O

IEEE Transactions on Parallel and Distributed Systems
Exploiting memory affinity in OpenMP through schedule reuse

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
OpenMP on networks of workstations for software DSMs

Journal of Computer Science and Technology
Achieving Scalable Locality with Time Skewing

International Journal of Parallel Programming
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models

International Journal of Parallel Programming
Design and Evaluation of a Compiler-Directed Collective I/O Technique

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Collective I/O Scheme Based on Compiler Analysis

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Improving whole-program locality using intra-procedural and inter-procedural transformations

Journal of Parallel and Distributed Computing
affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system

Proceedings of the 19th annual international conference on Supercomputing
A transparent runtime data distribution engine for OpenMP

Scientific Programming
Scaling non-regular shared-memory codes by reusing custom loop schedules

Scientific Programming - OpenMP
Analyses for the translation of OpenMP codes into SPMD style with array privatization

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Automatic data distribution method using first touch control for distributed shared memory multiprocessors

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Efficient OpenMP data mapping for multicore platforms with vertically stacked memory

Proceedings of the Conference on Design, Automation and Test in Europe
Vertical stealing: robust, locality-aware do-all workload distribution for 3D MPSoCs

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Development of the Arctic Research Mapping Application (ARMAP): Interoperability challenges and solutions

Computers & Geosciences
Supporting OpenMP on a multi-cluster embedded MPSoC

Microprocessors & Microsystems
A hybrid strategy based on data distribution and migration for optimizing memory locality

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Matching memory access patterns and data placement for NUMA systems

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Multi-objective aware extraction of task-level parallelism using genetic algorithms

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cache-coherent multiprocessors with distributed shared memory are becoming increasingly popular for parallel computing. However, obtaining high performance on these machines mquires that an application execute with good data locality. In addition to making efiective use of caches, it is often necessary to distribute data structures across the local memories of the processing nodes, thereby reducing the latency of cache misses.We have designed a set of abstractions for performing data distribution in the context of explicitly parallel programs and implemented them within the SGI MIPSpro compiler system. Our system incorporates many unique features to enhance both programmability and performance. We address the former by providing a very simple programmming model with extensive support for error detection. Regarding performance, we carefully design the user abstractions with the underlying compiler optimizations in mind, we incorporate several optimization techniques to generate efficient code for accessing distributed data, and we provide a tight integration of these techniques with other optimizations within the compiler Our initial experience suggests that the directives are easy to use and can yield substantial performance gains, in some cases by as much as a factor of 3 over the same codes without distribution.