Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Scheduler activations: effective kernel support for the user-level management of parallelism
ACM Transactions on Computer Systems (TOCS)
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exokernel: an operating system architecture for application-level resource management
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Kernel-level scheduling for the nano-threads programming model
ICS '98 Proceedings of the 12th international conference on Supercomputing
The data locality of work stealing
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
First-class user-level threads
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Efficient load balancing for wide-area divide-and-conquer applications
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Executing functional programs on a virtual tree of processors
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
The Performance of Work Stealing in Multiprogrammed Environments
The Performance of Work Stealing in Multiprogrammed Environments
Adaptive scheduling with parallelism feedback
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Enabling scalability and performance in a large scale CMP environment
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Adaptive work-stealing with parallelism feedback
ACM Transactions on Computer Systems (TOCS)
Scheduling multithreaded computations by work stealing
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Load balancing using work-stealing for pipeline parallelism in emerging applications
Proceedings of the 23rd international conference on Supercomputing
The design of a task parallel library
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
The Cilk++ concurrency platform
The Journal of Supercomputing
Using memory mapping to support cactus stacks in work-stealing runtime systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems
Proceedings of the VLDB Endowment
Energy-efficient work-stealing language runtimes
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures
Proceedings of Programming Models and Applications on Multicores and Manycores
Hi-index | 0.00 |
Running multithreaded programs in multicore systems has become a common practice for many application domains. Work stealing is a widely-adopted and effective approach for managing and scheduling the concurrent tasks of such programs. Existing work-stealing schedulers, however, are not effective when multiple applications time-share a single multicore---their management of steal-attempting threads often causes unbalanced system effects that hurt both workload throughput and fairness. In this paper, we present BWS (Balanced Work Stealing), a work-stealing scheduler for time-sharing multicore systems that leverages new, lightweight operating system support. BWS improves system throughput and fairness via two means. First, it monitors and controls the number of awake, steal-attempting threads for each application, so as to balance the costs (resources consumed in steal attempts) and benefits (available tasks get promptly stolen) of such threads. Second, a steal-attempting thread can yield its core directly to a peer thread with an unfinished task, so as to retain the core for that application and put it to better use. We have implemented a prototype of BWS based on Cilk++, a state-of-the-art work-stealing scheduler. Our performance evaluation with various sets of concurrent applications demonstrates the advantages of BWS over Cilk++, with average system throughput increased by 12.5% and average unfairness decreased from 124% to 20%.