BWS: balanced work stealing for time-sharing multicores

Authors:
Xiaoning Ding;Kaibo Wang;Phillip B. Gibbons;Xiaodong Zhang
Affiliations:
Intel Labs, PIttsburgh, PA, USA;The Ohio State University, Columbus, OH, USA;Intel Labs, Pittsburgh, PA, USA;The Ohio State University, Pittsburgh, PA, USA
Venue:
Proceedings of the 7th ACM european conference on Computer Systems
Year:
2012

Citing 29
Cited 3

Process control and scheduling issues for multiprogrammed shared-memory multiprocessors

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Scheduling Support for Concurrency and Parallelism in the Mach Operating System

Computer
Scheduler activations: effective kernel support for the user-level management of parallelism

ACM Transactions on Computer Systems (TOCS)
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exokernel: an operating system architecture for application-level resource management

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Kernel-level scheduling for the nano-threads programming model

ICS '98 Proceedings of the 12th international conference on Supercomputing
The data locality of work stealing

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
First-class user-level threads

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Efficient load balancing for wide-area divide-and-conquer applications

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Executing functional programs on a virtual tree of processors

FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
The Performance of Work Stealing in Multiprogrammed Environments

The Performance of Work Stealing in Multiprogrammed Environments
Adaptive scheduling with parallelism feedback

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Carbon: architectural support for fine-grained parallelism on chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Adaptive and reliable parallel computing on networks of workstations

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Enabling scalability and performance in a large scale CMP environment

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Adaptive work-stealing with parallelism feedback

ACM Transactions on Computer Systems (TOCS)
Scheduling multithreaded computations by work stealing

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Load balancing using work-stealing for pipeline parallelism in emerging applications

Proceedings of the 23rd international conference on Supercomputing
The design of a task parallel library

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Flexible architectural support for fine-grain scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
The Cilk++ concurrency platform

The Journal of Supercomputing
Using memory mapping to support cactus stacks in work-stealing runtime systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems

Proceedings of the VLDB Endowment
Energy-efficient work-stealing language runtimes

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures

Proceedings of Programming Models and Applications on Multicores and Manycores

Quantified Score

Hi-index	0.00

Visualization

Abstract

Running multithreaded programs in multicore systems has become a common practice for many application domains. Work stealing is a widely-adopted and effective approach for managing and scheduling the concurrent tasks of such programs. Existing work-stealing schedulers, however, are not effective when multiple applications time-share a single multicore---their management of steal-attempting threads often causes unbalanced system effects that hurt both workload throughput and fairness. In this paper, we present BWS (Balanced Work Stealing), a work-stealing scheduler for time-sharing multicore systems that leverages new, lightweight operating system support. BWS improves system throughput and fairness via two means. First, it monitors and controls the number of awake, steal-attempting threads for each application, so as to balance the costs (resources consumed in steal attempts) and benefits (available tasks get promptly stolen) of such threads. Second, a steal-attempting thread can yield its core directly to a peer thread with an unfinished task, so as to retain the core for that application and put it to better use. We have implemented a prototype of BWS based on Cilk++, a state-of-the-art work-stealing scheduler. Our performance evaluation with various sets of concurrent applications demonstrates the advantages of BWS over Cilk++, with average system throughput increased by 12.5% and average unfairness decreased from 124% to 20%.