Automatic Support for Irregular Computations in a High-Level Language

Authors:
Jimmy Su;Katherine Yelick
Affiliations:
University of California at Berkeley;University of California at Berkeley
Venue:
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Year:
2005

Citing 10
Cited 12

Supporting shared data structures on distributed memory architectures

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Modeling the benefits of mixed data and task parallelism

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
The Java Language Specification

The Java Language Specification
Run-time and compile-time support for adaptive irregular problems

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
The Data Mover: A Machine-Independent Abstraction for Managing Customized Data Motion

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Compiler Analysis for Irregular Problems in Fortran D

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Slicing Analysis and Indirect Accesses to Distributed Arrays

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Compile-time composition of run-time data and iteration reorderings

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Adaptive Mesh Refinement in Titanium

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01

Making Sequential Consistency Practical in Titanium

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Optimizing irregular shared-memory applications for distributed-memory systems

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Executing irregular scientific applications on stream architectures

Proceedings of the 21st annual international conference on Supercomputing
Productivity and performance using partitioned global address space languages

Proceedings of the 2007 international workshop on Parallel symbolic computation
Parallel Languages and Compilers: Perspective From the Titanium Experience

International Journal of High Performance Computing Applications
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems

Proceedings of the 22nd annual international conference on Supercomputing
Optimizing irregular shared-memory applications for clusters

Proceedings of the 22nd annual international conference on Supercomputing
Performance portable optimizations for loops containing communication operations

Proceedings of the 22nd annual international conference on Supercomputing
Runtime optimization of vector operations on large scale SMP clusters

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Automatic communication coalescing for irregular computations in UPC language

CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Improving communication in PGAS environments: static and dynamic coalescing in UPC

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of writing high performance parallel applications becomes even more challenging when irregular, sparse or adaptive methods are employed. In this paper we introduce compiler and runtime support for programs with indirect array accesses into Titanium, a high-level language that combines an explicit SPMD parallelism model with implicit communication through a global shared address space. By combining the well-known inspectorexecutor technique with high level multi-dimensional array constructs, compiler analysis and performance modeling, we demonstrate optimizations that are entirely hidden from the programmer. The global address space makes the programs easier to write than in message passing, with remote array accesses used in place of explicit messages with data packing and unpacking. The programs are also faster than message passing programs: Using sparse matrixvector multiplication programs, we show that the Titanium code is an average of 21% faster across several matrices and machines, with the best case speedup more than a factor of 2x. The performance advantages are due to both the lightweight RDMA (Remote Direct Memory Access) communication model that underlies the Titanium implementation and automatic optimization selection that adapts the communication to the machine and workload, in some cases using different communication models for different processors within a single computation.