CPU+GPU scheduling with asymptotic profiling

Authors:
Zhenning Wang;Long Zheng;Quan Chen;Minyi Guo
Affiliations:
-;-;-;-
Venue:
Parallel Computing
Year:
2014

Citing 23
Cited 0

OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
Fast and Effective Task Scheduling in Heterogeneous Systems

HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Improving Scheduling of Tasks in a Heterogeneous Environment

IEEE Transactions on Parallel and Distributed Systems
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Scout: a data-parallel programming language for graphics processors

Parallel Computing
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Predictive Runtime Code Scheduling for Heterogeneous Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An adaptive performance modeling tool for GPU architectures

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
An integrated GPU power and performance model

Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
MOON: MapReduce On Opportunistic eNvironments

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

Concurrency and Computation: Practice & Experience - Euro-Par 2009
A static task partitioning approach for heterogeneous systems using OpenCL

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
A quantitative performance analysis model for GPU architectures

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
OpenMP for accelerators

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Productive cluster programming with OmpSs

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Heterogeneous Task Scheduling for Accelerated OpenMP

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
WATS: Workload-Aware Task Scheduling in Asymmetric Multi-core Architectures

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hybrid systems with CPU and GPU have become new standard in high performance computing. Workload can be split and distributed to CPU and GPU to utilize them for data-parallelism in hybrid systems. But it is challenging to manually split and distribute the workload between CPU and GPU since the performance of GPU is sensitive to the workload it received. Therefore, current dynamic schedulers balance workload between CPU and GPU periodically and dynamically. The periodical balance operation causes frequent synchronizations between CPU and GPU. It often degrades the overall performance because of the overhead of synchronizations. To solve the problem, we propose a Co-Scheduling Strategy Based on Asymptotic Profiling (CAP). CAP dynamically splits and distributes the workload to CPU and GPU with only a few synchronizations. It adopts the profiling technique to predict performance and partitions the workload according to the performance. It is also optimized for GPU's performance characteristics. We examine our proof-of-concept system with six benchmarks and evaluation result shows that CAP produces up to 42.7% performance improvement on average compared with the state-of-the-art co-scheduling strategies.