Harmony: an execution model and runtime for heterogeneous many core systems

Authors:
Gregory F. Diamos;Sudhakar Yalamanchili
Affiliations:
Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA
Venue:
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Year:
2008

Citing 5
Cited 25

A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

Proceedings of the 30th annual international symposium on Computer architecture
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Merge: a programming model for heterogeneous multi-core systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems

GViM: GPU-accelerated virtual machines

Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Embracing heterogeneity: parallel programming for changing hardware

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
A domain-specific approach to heterogeneous parallelism

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Attaining system performance points: revisiting the end-to-end argument in system design for heterogeneous many-core systems

ACM SIGOPS Operating Systems Review
A static task partitioning approach for heterogeneous systems using OpenCL

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
MDR: performance model driven runtime for heterogeneous parallel platforms

Proceedings of the international conference on Supercomputing
Pegasus: coordinated scheduling for virtualized accelerator-based systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Improving performance of adaptive component-based dataflow middleware

Parallel Computing
PACUE: processor allocator considering user experience

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Operating systems should manage accelerators

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Automatic generation of software pipelines for heterogeneous parallel systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ValuePack: value-based scheduling framework for CPU-GPU clusters

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Prius: a runtime for hybrid computing

Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms

Proceedings of the ACM International Conference on Computing Frontiers
Load balancing in a changing world: dealing with heterogeneity and performance variability

Proceedings of the ACM International Conference on Computing Frontiers
Arbiter work stealing for parallelizing games on heterogeneous computing environments

Proceedings of the High Performance Computing Symposium
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
RSVM: a region-based software virtual memory for GPU

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Scheduling concurrent applications on a cluster of CPU-GPU nodes

Future Generation Computer Systems
Red Fox: An Execution Environment for Relational Query Processing on GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
A Self-tuning Scientific Framework using Model-Driven Engineering for Heterogeneous Execution Platforms

Proceedings of International Workshop on Adaptive Self-tuning Computing Systems
Efficient implementation of data flow graphs on multi-gpu clusters

Journal of Real-Time Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The emergence of heterogeneous many core architectures presents a unique opportunity for delivering order of magnitude performance increases to high performance applications by matching certain classes of algorithms to specifically tailored architectures. Their ubiquitous adoption, however, has been limited by a lack of programming models and management frameworks designed to reduce the high degree of complexity of software development intrinsic to heterogeneous architectures. This paper proposes Harmony, a runtime supported programming and execution model that provides: (1) semantics for simplifying parallelism management, (2) dynamic scheduling of compute intensive kernels to heterogeneous processor resources, and (3) online monitoring driven performance optimization for heterogeneous many core systems. We are particulably concerned with simplifying development and ensuring binary portability and scalability across system configurations and sizes. Initial results from ongoing development demonstrate the binary compatibility with variable number of cores, as well as dynamic adaptation of schedules to data sets. We present preliminary results of key features for some benchmark applications.