Improving application behavior on heterogeneous manycore systems through kernel mapping

Authors:
Omer Erdil Albayrak;Ismail Akturk;Ozcan Ozturk
Affiliations:
-;-;-
Venue:
Parallel Computing
Year:
2013

Citing 14
Cited 0

Integer and combinatorial optimization

Integer and combinatorial optimization
The NAS parallel benchmarks—summary and preliminary results

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Hardware-Software Cosynthesis for Digital Systems

IEEE Design & Test
Using modern graphics architectures for general-purpose computing: a framework and analysis

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Scheduling Strategies for Master-Slave Tasking on Heterogeneous Processor Platforms

IEEE Transactions on Parallel and Distributed Systems
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
A Unified Runtime System for Heterogeneous Multi-core Architectures

Euro-Par 2008 Workshops - Parallel Processing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

Concurrency and Computation: Practice & Experience - Euro-Par 2009
A static task partitioning approach for heterogeneous systems using OpenCL

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

SAAHPC '11 Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing
Architecture-Aware Mapping and Optimization on a 1600-Core GPU

ICPADS '11 Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems
Performance characterization of the NAS Parallel Benchmarks in OpenCL

IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization
Heterogeneous Task Scheduling for Accelerated OpenMP

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many-core accelerators are being more frequently deployed to improve the system processing capabilities. In such systems, application mapping must be enhanced to maximize utilization of the underlying architecture. Especially, in graphics processing units (GPUs), mapping kernels that are part of multi-kernel applications has a great impact on overall performance, since kernels may exhibit different characteristics on different CPUs and GPUs. While some kernels run faster on GPUs, others may perform better in CPUs. Thus, heterogeneous execution may yield better performance than executing the application only on a CPU or only on a GPU. In this paper, we investigate on two approaches: a novel profiling-based adaptive kernel mapping algorithm to assign each kernel of an application to the proper device, and a Mixed-Integer Programming (MIP) implementation to determine optimal mapping. We utilize profiling information for kernels on different devices and generate a map that identifies which kernel should run where in order to improve the overall performance of an application. Initial experiments show that our approach can efficiently map kernels on CPUs and GPUs, and outperforms CPU-only and GPU-only approaches.