Efficient, portable implementation of asynchronous multi-place programs

Authors:
Ganesh Bikshandi;Jose G. Castanos;Sreedhar B. Kodali;V. Krishna Nandivada;Igor Peshansky;Vijay A. Saraswat;Sayantan Sur;Pradeep Varma;Tong Wen
Affiliations:
IBM STG, Bangalore, India;IBM T.J. Watson Research Center, Yorktown Heights, NY, USA;IBM STG, Bangalore, India;IBM India Research Lab, New Delhi, India;IBM T.J. Watson Research Center, Hawthorne, NY, USA;IBM T.J. Watson Research Center, Hawthorne, NY, USA;IBM T.J. Watson Research Center, Hawthorne, NY, USA;IBM India Research Lab, New Delhi, India;Interactive Supercomputing, Boston, MA, USA
Venue:
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2009

Citing 13
Cited 12

Lazy task creation: a technique for increasing the granularity of parallel programs

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
A compiler-assisted approach to SPMD execution

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Compiler optimizations for eliminating barrier synchronization

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic active messages: a mechanism for scheduling communication with computation

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A method for parallel program generation with an application to the Booster language

ICS '90 Proceedings of the 4th international conference on Supercomputing
Starting with termination: a methodology for building distributed garbage collection algorithms

ACSC '01 Proceedings of the 24th Australasian conference on Computer science
Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis

ECOOP '95 Proceedings of the 9th European Conference on Object-Oriented Programming
Titanium Language Reference Manual

Titanium Language Reference Manual
Shared memory programming for large scale machines

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Software routing and aggregation of messages to optimize the performance of HPCC randomaccess benchmark

Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Chunking parallel loops in the presence of synchronization

Proceedings of the 23rd international conference on Supercomputing
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
A comparative study and empirical evaluation of global view High performance Linpack program in X10

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Reducing task creation and termination overhead in explicitly parallel programs

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Resource-aware programming and simulation of MPSoC architectures through extension of X10

Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems
Hiding latency in Coarray Fortran 2.0

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Evaluating the performance and scalability of mapreduce applications on X10

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Delegated isolation

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Global Futures: A Multithreaded Execution Model for Global Arrays-based Applications

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Adoption protocols for fanout-optimal fault-tolerant termination detection

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
A Transformation Framework for Optimizing Task-Parallel Programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Isolation for nested task parallelism

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The X10 programming language is organized around the notion of places (an encapsulation of data and activities operating on the data), partitioned global address space (PGAS), and asynchronous computation and communication. This paper introduces an expressive subset of X10, Flat X10, designed to permit efficient execution across multiple single-threaded places with a simple runtime and without compromising on the productivity of X10. We present the design, implementation and evaluation of a compiler and runtime system for Flat X10. The Flat X10 compiler translates programs into C++ SPMD programs communicating using an active messaging infrastructure. It uses novel techniques to transform explicitly parallel programs into SPMD programs. The runtime system is based on IBM's LAPI (Low-level API) and is easily portable to other libraries such as GASNet and ARMCI. Our implementation realizes performance comparable to hand-written MPI programs for well-known HPC benchmarks such as Random Access, Stream, and FFT, on a Federation-based cluster of Power5 SMPs (with hundreds of processors) and the Blue Gene (with thousands of processors). Submissions based on the work presented in this paper were co-winners of the 2007 and 2008 HPC Challenge Type II Awards.