An extensible global address space framework with decoupled task and data abstractions

  • Authors:
  • Sriram Krishnamoorthy;Umit Catalyurek;Jarek Nieplocha;Atanas Rountev;S. Sadayappan

  • Affiliations:
  • Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH;Dept. of Biomedical Informatics, The Ohio State University, Columbus, OH;Computational Sciences and Mathematics, Pacific Northwest National Laboratory, Richland, WA;Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH;Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH

  • Venue:
  • IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although message passing using MPI is the dominant model for parallel programming today, the significant effort required to develop high-performance MPI applications has prompted the development of several parallel programming models that are more convenient. Programming models such as Co-Array Fortran, Global Arrays, Titanium, and UPC provide a more convenient global view of the data, but face significant challenges in delivering high performance over a range of applications. It is particularly challenging to achieve high performance using global-address-space languages for unstructured applications with irregular data structures. In this paper, we describe a global-address-space parallel programming framework with decoupled task and data abstractions. The framework centers around the use of task pools, where tasks specify operands in a distributed, globally addressable pool of data chunks. The data chunks can be addressed in a logical multidimensional "tuple" space, and are distributed among the nodes of the system. Locality-aware load balancing of tasks in the task pool is achieved through judicious mapping via hyper-graph partitioning, as well as dynamic task/data migration. The framework implements a transparent interface for out-of-core data, so that explicit orchestration of movement of data between disks and memory is not required of the programmer. The use of the framework for implementation of parallel blocksparse tensor computations in the context of a quantum chemistry application is illustrated.