Parallel computing with x10

Authors:
PVR Murthy
Affiliations:
Siemens, Bangalore, India
Venue:
Proceedings of the 1st international workshop on Multicore software engineering
Year:
2008

Citing 1
Cited 2

X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications

Evaluating the performance and scalability of mapreduce applications on X10

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
The design and implementation of clocked variables in X10

ACSC '13 Proceedings of the Thirty-Sixth Australasian Computer Science Conference - Volume 135

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many problems require parallel solutions and implementations and how to extract and specify parallelism has been the focus of Research during the last few decades. While there has been a significant progress in terms of (a)automatically deriving implicit parallelism from functional and logic programs, (b) using parallelizing compilers to extract parallelism from serial programs written in Fortran or C mainly by parallelizing loop constructs and (c) evolution of standards such as Message Passing Interface (MPI) to allow a Fortran or C programmer to decompose a problem into a parallel solution, the parallel computing problem is still not solved completely. With the emergence of parallel computing architectures based on multi-core chips, there is a need to rewrite existing software and also develop future software so that parallelism available at the hardware level is fully exploited. Executing concurrent or distributed programs using modern object-oriented programming languages such as Java and C# is possible on two platforms: 1. a uniprocessor or shared memory multiprocessor system on which one or more threads execute against a single shared heap in a single virtual machine and 2. a loosely coupled distributed computing system in which each node has its own virtual machine and communicates with other nodes using protocols such as RMI. Computer systems are already consisting of and will have multicore SMP nodes with non-uniform memory hierarchies interconnected in horizontally scalable cluster configurations. Since the current High Performance Computing programming models do not support the notions of a non-uniform data access or of tight coupling of distributed nodes, the models are ineffective in addressing the needs of such a system. As a consequence, X10 is proposed [1, 2]. The target machine for the execution of an X10 program may range from a uniprocessor machine to a large cluster of parallel processors supporting millions of concurrent operations. The design goals of X10 are to achieve a balance among Safety, Analyzability, Scalability and Flexibility. The X10 programming model uses the serial subset of Java and introduces new features to ensure that a suitable expression of parallelism is the basis for exploiting the modern computer architectures. X10 introduces a Partitioned Global Address Space (PGAS) that materializes as locality in the form of places. To provide a foundation for concurrency constructs in the language, dynamic and asynchronous activities are introduced in X10. To support dense and sparse distributed multi-dimensional arrays, X10 introduces a rich array sub-language. The Java programming model uses the notion of a single uniform heap and this is a limitation in using the language on non-uniform cluster computing systems. Scalability problems are reported in trying to automatically map a uniform heap onto a non-uniform cluster. Places in X10 attempt to address the scalability issue by letting an X10 programmer decide which objects and activities are co-located. To be able to create light-weight threads locally or remotely, X10 introduces the notion of asynchronous activities. The corresponding mechanisms in Java are heavy weight. The language constructs async, future, foreach, ateach, finish, clocks and atomic blocks are designed to co-ordinate asynchronous activities in an X10 program. The elements of an array are distributed across multiple places in the partitioned global address space based on the array's distribution specification. Throughout the program's execution, the distribution remains unchanged. The issues of locality and distribution cannot be hidden from a programmer of high-performance code and X10 reflects this in its design choices. To illustrate X10's features to implement concurrent and distributed computations, sample programs are discussed.