OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Fast and Effective Task Scheduling in Heterogeneous Systems
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Improving Scheduling of Tasks in a Heterogeneous Environment
IEEE Transactions on Parallel and Distributed Systems
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Scout: a data-parallel programming language for graphics processors
Parallel Computing
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Predictive Runtime Code Scheduling for Heterogeneous Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems
Proceedings of the 23rd international conference on Supercomputing
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Proceedings of the 36th annual international symposium on Computer architecture
CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An adaptive performance modeling tool for GPU architectures
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
An integrated GPU power and performance model
Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
MOON: MapReduce On Opportunistic eNvironments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Concurrency and Computation: Practice & Experience - Euro-Par 2009
Mars: Accelerating MapReduce with Graphics Processors
IEEE Transactions on Parallel and Distributed Systems
A static task partitioning approach for heterogeneous systems using OpenCL
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
A quantitative performance analysis model for GPU architectures
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Productive cluster programming with OmpSs
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Statistical performance comparisons of computers
HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Heterogeneous Task Scheduling for Accelerated OpenMP
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
WATS: Workload-Aware Task Scheduling in Asymmetric Multi-core Architectures
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures
Proceedings of Programming Models and Applications on Multicores and Manycores
Hi-index | 0.00 |
Hybrid systems with CPU and GPU have become the new standard in high performance computing. Workloads are split into two parts and distributed to different devices to utilize both CPU and GPU for data parallelism in hybrid systems. But it is challenging for users to manually balance workload between CPU and GPU since GPU is sensitive to the scale of the problem. Therefore, current dynamic schedulers balance workload between CPU and GPU periodically and dynamically. The periodical balance operation causes frequent synchronizations between CPU and GPU and the synchronizations often degrade the overall performance. To solve the problem, we propose a Co-Scheduling Strategy Based on Asymptotic Profiling (CAP). CAP dynamically splits one task's workload to CPU and GPU and adopts the profiling technique to predict the workload in next partition. CAP is optimized for GPU's performance characteristics to balance workload between CPU and GPU with only a few synchronizations. We examine our proof-of-concept system with four benchmarks and results show that CAP produces up to 45.1% performance improvement compared with the state-of-art co-scheduling strategy.