Performance analysis of job scheduling policies in parallel supercomputing environments
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Effective distributed scheduling of parallel workloads
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The elusive goal of workload characterization
ACM SIGMETRICS Performance Evaluation Review
Using moldability to improve the performance of supercomputer jobs
Journal of Parallel and Distributed Computing
A Symbolic Approachto Modeling Cellular Behavior
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Scheduling Resources in Multi-User, Heterogeneous, Computing Environments with SmartNet
HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems
HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Realistic Modeling and Svnthesis of Resources for Computational Grids
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Dynamic thread assignment on heterogeneous multiprocessor architectures
Proceedings of the 3rd conference on Computing frontiers
New grid scheduling and rescheduling methods in the GrADS project
International Journal of Parallel Programming - Special issue: The next generation software program
Improving grid resource allocation via integrated selection and binding
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Assessment and enhancement of meta-schedulers for multi-site job sharing
HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Harmony: an execution model and runtime for heterogeneous many core systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
Languages and Compilers for Parallel Computing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Modeling GPU-CPU workloads and systems
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Proceedings of the 24th ACM International Conference on Supercomputing
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
MapCG: writing parallel program portable between CPU and GPU
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Scheduling parallel applications on utility grids: time and cost trade-off management
ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Comprehensive Performance Monitoring for GPU Cluster Systems
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Parallel job scheduling — a status report
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
ValuePack: value-based scheduling framework for CPU-GPU clusters
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Hi-index | 0.00 |
Heterogeneous architectures comprising a multicore CPU and many-core GPU(s) are increasingly being used within cluster and cloud environments. In this paper, we study the problem of optimizing the overall throughput of a set of applications deployed on a cluster of such heterogeneous nodes. We consider two different scheduling formulations. In the first formulation, we consider jobs that can be executed on either the GPU or the CPU of a single node. In the second formulation, we consider jobs that can be executed on the CPU, GPU, or both, of any number of nodes in the system. We have developed scheduling schemes addressing both of the problems. In our evaluation, we first show that the schemes proposed for first formulation outperform a blind round-robin scheduler and approximate the performances of an ideal scheduler that involves an impractical exhaustive exploration of all possible schedules. Next, we show that the scheme proposed for the second formulation outperforms the best of existing schemes for heterogeneous clusters, TORQUE and MCT, by up to 42%.