Optimizing irregular shared-memory applications for clusters

Authors:
Seung-Jai Min;Rudolf Eigenmann
Affiliations:
Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA
Venue:
Proceedings of the 22nd annual international conference on Supercomputing
Year:
2008

Citing 25
Cited 3

Principles of runtime support for parallel processors

ICS '88 Proceedings of the 2nd international conference on Supercomputing
GIVE-N-TAKE—a balanced code placement framework

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Index translation schemes for adaptive computations on distributed memory multicomputers

Index translation schemes for adaptive computations on distributed memory multicomputers
A manual for the CHAOS runtime library

A manual for the CHAOS runtime library
Interprocedural compilation of irregular applications for distributed memory machines

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Index array flattening through program transformation

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Idiom recognition in the Polaris parallelizing compiler

ICS '95 Proceedings of the 9th international conference on Supercomputing
TreadMarks: Shared Memory Computing on Networks of Workstations

Computer
Compiler and software distributed shared memory support for irregular applications

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Run-time and compile-time support for adaptive irregular problems

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
Run-Time Techniques for Parallelizing Sparse Matrix Problems

IRREGULAR '95 Proceedings of the Second International Workshop on Parallel Algorithms for Irregularly Structured Problems
Titanium Language Reference Manual

Titanium Language Reference Manual
MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard
Optimizing OpenMP programs on software distributed shared memory systems

International Journal of Parallel Programming - Special issue: OpenMP: Experiences and implementations
Runtime Compression of MPI Messanes to Improve the Performance and Scalability of Parallel Applications

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Automatic Support for Irregular Computations in a High-Level Language

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Combined compile-time and runtime-driven, pro-active data movement in software DSM systems

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Towards automatic translation of OpenMP to MPI

Proceedings of the 19th annual international conference on Supercomputing
Optimizing irregular shared-memory applications for distributed-memory systems

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Software write detection for a distributed shared memory

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
The region trap library: handling traps on application-defined regions of memory

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Automatic nonblocking communication for partitioned global address space programs

Proceedings of the 21st annual international conference on Supercomputing
Runtime address space computation for SDSM systems

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing

OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic CPU-GPU communication management and optimization

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Dynamically managed data for CPU-GPU architectures

Proceedings of the Tenth International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Irregular applications pose challenges in optimizing communication, due to the difficulty of analyzing irregular data accesses accurately and efficiently. This challenge is especially big when translating irregular shared-memory applications to message-passing form for clusters. The lack of effective irregular data analysis in the translation system results in unnecessary or redundant communication, which limits application scalability. In this paper, we present a Lean Distributed Shared Memory (LDSM) system, which features a fast and accurate irregular data access (IDA) analysis. The analysis uses a region-based diff method and makes use of a runtime library that is optimized for irregular applications. We describe three optimizations that improve the LDSM system performance. A parallel array reduction transformation reduces overheads in the analysis. A packed communication optimization and a differential communication optimization effectively eliminate unnecessary and redundant messages. We evaluate the performance of the optimized LDSM system on a set of representative irregular benchmarks. The optimized LDSM executes irregular applications on average 45% faster than the hand-tuned MPI applications.