Proceedings of the 30th annual international symposium on Computer architecture
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
GViM: GPU-accelerated virtual machines
Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Embracing heterogeneity: parallel programming for changing hardware
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
A domain-specific approach to heterogeneous parallelism
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
ACM SIGOPS Operating Systems Review
A static task partitioning approach for heterogeneous systems using OpenCL
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
MDR: performance model driven runtime for heterogeneous parallel platforms
Proceedings of the international conference on Supercomputing
Pegasus: coordinated scheduling for virtualized accelerator-based systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Improving performance of adaptive component-based dataflow middleware
Parallel Computing
PACUE: processor allocator considering user experience
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Operating systems should manage accelerators
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ValuePack: value-based scheduling framework for CPU-GPU clusters
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Prius: a runtime for hybrid computing
Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms
Proceedings of the ACM International Conference on Computing Frontiers
Load balancing in a changing world: dealing with heterogeneity and performance variability
Proceedings of the ACM International Conference on Computing Frontiers
Arbiter work stealing for parallelizing games on heterogeneous computing environments
Proceedings of the High Performance Computing Symposium
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
RSVM: a region-based software virtual memory for GPU
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Scheduling concurrent applications on a cluster of CPU-GPU nodes
Future Generation Computer Systems
Red Fox: An Execution Environment for Relational Query Processing on GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Proceedings of International Workshop on Adaptive Self-tuning Computing Systems
Efficient implementation of data flow graphs on multi-gpu clusters
Journal of Real-Time Image Processing
Hi-index | 0.00 |
The emergence of heterogeneous many core architectures presents a unique opportunity for delivering order of magnitude performance increases to high performance applications by matching certain classes of algorithms to specifically tailored architectures. Their ubiquitous adoption, however, has been limited by a lack of programming models and management frameworks designed to reduce the high degree of complexity of software development intrinsic to heterogeneous architectures. This paper proposes Harmony, a runtime supported programming and execution model that provides: (1) semantics for simplifying parallelism management, (2) dynamic scheduling of compute intensive kernels to heterogeneous processor resources, and (3) online monitoring driven performance optimization for heterogeneous many core systems. We are particulably concerned with simplifying development and ensuring binary portability and scalability across system configurations and sizes. Initial results from ongoing development demonstrate the binary compatibility with variable number of cores, as well as dynamic adaptation of schedules to data sets. We present preliminary results of key features for some benchmark applications.