Integer and combinatorial optimization
Integer and combinatorial optimization
The NAS parallel benchmarks—summary and preliminary results
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Hardware-Software Cosynthesis for Digital Systems
IEEE Design & Test
Using modern graphics architectures for general-purpose computing: a framework and analysis
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Scheduling Strategies for Master-Slave Tasking on Heterogeneous Processor Platforms
IEEE Transactions on Parallel and Distributed Systems
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
A Unified Runtime System for Heterogeneous Multi-core Architectures
Euro-Par 2008 Workshops - Parallel Processing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Concurrency and Computation: Practice & Experience - Euro-Par 2009
A static task partitioning approach for heterogeneous systems using OpenCL
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11 Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
ICPADS '11 Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems
Performance characterization of the NAS Parallel Benchmarks in OpenCL
IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization
Heterogeneous Task Scheduling for Accelerated OpenMP
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Hi-index | 0.00 |
Many-core accelerators are being more frequently deployed to improve the system processing capabilities. In such systems, application mapping must be enhanced to maximize utilization of the underlying architecture. Especially, in graphics processing units (GPUs), mapping kernels that are part of multi-kernel applications has a great impact on overall performance, since kernels may exhibit different characteristics on different CPUs and GPUs. While some kernels run faster on GPUs, others may perform better in CPUs. Thus, heterogeneous execution may yield better performance than executing the application only on a CPU or only on a GPU. In this paper, we investigate on two approaches: a novel profiling-based adaptive kernel mapping algorithm to assign each kernel of an application to the proper device, and a Mixed-Integer Programming (MIP) implementation to determine optimal mapping. We utilize profiling information for kernels on different devices and generate a map that identifies which kernel should run where in order to improve the overall performance of an application. Initial experiments show that our approach can efficiently map kernels on CPUs and GPUs, and outperforms CPU-only and GPU-only approaches.