Hierarchical place trees: a portable abstraction for task parallelism and data movement

Authors:
Yonghong Yan;Jisheng Zhao;Yi Guo;Vivek Sarkar
Affiliations:
Department of Computer Science, Rice University;Department of Computer Science, Rice University;Department of Computer Science, Rice University;Department of Computer Science, Rice University
Venue:
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Year:
2009

Citing 11
Cited 19

NAS parallel benchmark results

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A parallel java grande benchmark suite

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
COOL: An Object-Based Language for Parallel Programming

Computer
Quantitive studies of data-locality sensitivity on the EARTH multithreaded architecture: preliminary results

HIPC '96 Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Programming for parallelism and locality with hierarchically tiled arrays

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Type inference for locality analysis of distributed data structures

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Array optimizations for high productivity programming languages

Array optimizations for high productivity programming languages

SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Comparing the usability of library vs. language approaches to task parallelism

Evaluation and Usability of Programming Languages and Tools
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Unified parallel C for GPU clusters: language extensions and compiler implementation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Improving programmability of heterogeneous many-core systems via explicit platform descriptions

Proceedings of the 4th International Workshop on Multicore Software Engineering
Evaluating the performance and scalability of mapreduce applications on X10

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Delegated isolation

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Using explicit platform descriptions to support programming of heterogeneous many-core systems

Parallel Computing
Habanero-Java: the new adventures of old X10

Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
DrHJ: a lightweight pedagogic IDE for Habanero Java

Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Intermediate language extensions for parallelism

Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Which problems does a multi-language virtual machine need to solve in the multicore/manycore era?

Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Towards a codelet-based runtime for exascale computing: position paper

Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Mapping a data-flow programming model onto heterogeneous platforms

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Characterizing and mitigating work time inflation in task parallel programs

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Designing a unified programming model for heterogeneous machines

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Language support for dynamic, hierarchical data partitioning

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Characterizing and mitigating work time inflation in task parallel programs

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern computer systems feature multiple homogeneous or heterogeneous computing units with deep memory hierarchies, and expect a high degree of thread-level parallelism from the software. Exploitation of data locality is critical to achieving scalable parallelism, but adds a significant dimension of complexity to performance optimization of parallel programs. This is especially true for programming models where locality is implicit and opaque to programmers. In this paper, we introduce the hierarchical place tree (HPT) model as a portable abstraction for task parallelism and data movement. The HPT model supports co-allocation of data and computation at multiple levels of a memory hierarchy. It can be viewed as a generalization of concepts from the Sequoia and X10 programming models, resulting in capabilities that are not supported by either. Compared to Sequoia, HPT supports three kinds of data movement in a memory hierarchy rather than just explicit data transfer between adjacent levels, as well as dynamic task scheduling rather than static task assignment. Compared to X10, HPT provides a hierarchical notion of places for both computation and data mapping. We describe our work-in-progress on implementing the HPT model in the Habanero-Java (HJ) compiler and runtime system. Preliminary results on general-purpose multicore processors and GPU accelerators indicate that the HPT model can be a promising portable abstraction for future multicore processors.