SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Impact of sharing-based thread placement on multithreaded architectures
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Processor Sharing Queueing Models of Mixed Scheduling Disciplines for Time Shared System
Journal of the ACM (JACM)
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Performance characteristics of gang scheduling in multiprogrammed environments
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration
IEEE Transactions on Parallel and Distributed Systems
System noise, OS clock ticks, and fine-grained parallel applications
Proceedings of the 19th annual international conference on Supercomputing
Adaptive scheduling with parallelism feedback
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
ULE: a modern scheduler for FreeBSD
BSDC'03 Proceedings of the BSD Conference 2003 on BSD Conference
Backfilling Using System-Generated Predictions Rather than User Runtime Estimates
IEEE Transactions on Parallel and Distributed Systems
Implementing lottery scheduling: matching the specializations in traditional schedulers
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Efficient operating system scheduling for performance-asymmetric multi-core architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Characterizing application sensitivity to OS interference using kernel-level noise injection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
PAM: a novel performance/power aware meta-scheduler for multi-core systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A dynamic scheduler for balancing HPC applications
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Tessellation: space-time partitioning in a manycore client OS
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Optimizing collective communication on multicores
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Juggle: proactive load balancing on multicore computers
Proceedings of the 20th international symposium on High performance distributed computing
Improving job scheduling algorithms in a grid environment
Future Generation Computer Systems
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
Dynamic threshold for imbalance assessment on load balancing for multicore systems
Computers and Electrical Engineering
Uncovering CPU load balancing policies with harmony
Proceedings of the ACM International Conference on Computing Frontiers
Load balancing non-uniform parallel computations
Proceedings of the 2013 workshop on Programming based on actors, agents, and decentralized control
Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
To fully exploit multicore processors, applications are expected to provide a large degree of thread-level parallelism. While adequate for low core counts and their typical workloads, the current load balancing support in operating systems may not be able to achieve efficient hardware utilization for parallel workloads. Balancing run queue length globally ignores the needs of parallel applications where threads are required to make equal progress. In this paper we present a load balancing technique designed specifically for parallel applications running on multicore systems. Instead of balancing run queue length, our algorithm balances the time a thread has executed on ``faster'' and ``slower'' cores. We provide a user level implementation of speed balancing on UMA and NUMA multi-socket architectures running Linux and discuss behavior across a variety of workloads, usage scenarios and programming models. Our results indicate that speed balancing when compared to the native Linux load balancing improves performance and provides good performance isolation in all cases considered. Speed balancing is also able to provide comparable or better performance than DWRR, a fair multi-processor scheduling implementation inside the Linux kernel. Furthermore, parallel application performance is often determined by the implementation of synchronization operations and speed balancing alleviates the need for tuning the implementations of such primitives.