Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Executing multithreaded programs efficiently
Executing multithreaded programs efficiently
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
Proceedings of the ACM 2000 conference on Java Grande
Scheduling Cilk multithreaded parallel programs on processors of different speeds
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Compiler-directed dynamic voltage/frequency scheduling for energy reduction in microprocessors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Power efficiency of voltage scaling in multiple clock, multiple voltage cores
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Compile-time dynamic voltage scaling settings: opportunities and limits
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor
Proceedings of the 30th annual international symposium on Computer architecture
Voltage and Frequency Control With Adaptive Reaction Time in Multiple-Clock-Domain Processors
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Balancing power consumption in multiprocessor systems
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Adaptive work stealing with parallelism feedback
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling for reduced CPU energy
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Dynamic voltage frequency scaling for multi-tasking systems using online learning
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters
ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
Eon: a language and runtime system for perpetual systems
Proceedings of the 5th international conference on Embedded networked sensor systems
Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread motion: fine-grained power management for multi-core systems
Proceedings of the 36th annual international symposium on Computer architecture
Runtime support for multicore Haskell
Proceedings of the 14th ACM SIGPLAN international conference on Functional programming
The design of a task parallel library
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An adaptive task creation strategy for work-stealing scheduling
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Green: a framework for supporting energy-conscious programming using controlled approximation
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
EnerJ: approximate data types for safe and general low-power computation
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
A generic parallel collection framework
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Architecture support for disciplined approximate programming
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
BWS: balanced work stealing for time-sharing multicores
Proceedings of the 7th ACM european conference on Computer Systems
Work-stealing without the baggage
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Scheduling parallel programs by work stealing with private deques
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Work stealing is a promising approach to constructing multithreaded program runtimes of parallel programming languages. This paper presents HERMES, an energy-efficient work-stealing language runtime. The key insight is that threads in a work-stealing environment -- thieves and victims - have varying impacts on the overall program running time, and a coordination of their execution "tempo" can lead to energy efficiency with minimal performance loss. The centerpiece of HERMES is two complementary algorithms to coordinate thread tempo: the workpath-sensitive algorithm determines tempo for each thread based on thief-victim relationships on the execution path, whereas the workload-sensitive algorithm selects appropriate tempo based on the size of work-stealing deques. We construct HERMES on top of Intel Cilk Plus's runtime, and implement tempo adjustment through standard Dynamic Voltage and Frequency Scaling (DVFS). Benchmarks running on HERMES demonstrate an average of 11-12% energy savings with an average of 3-4% performance loss through meter-based measurements over commercial CPUs.