An approach to resource-aware co-scheduling for CMPs

Authors:
Major Bhadauria;Sally A. McKee
Affiliations:
Cornell University, Ithaca, NY;Chalmers University of Technology, Göteborg, Sweden
Venue:
Proceedings of the 24th ACM International Conference on Supercomputing
Year:
2010

Citing 22
Cited 15

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Maximizing memory bandwidth for streamed computations

Maximizing memory bandwidth for streamed computations
Improving Gang Scheduling through job performance analysis and malleability

ICS '01 Proceedings of the 15th international conference on Supercomputing
Comparing Gang Scheduling with Dynamic Space Sharing on Symmetric Multiprocessors Using Automatic Self-Allocating Threads (ASAT)

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Effects of Memory Performance on Parallel Job Scheduling

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Adaptive Parallel Job Scheduling with Flexible Coscheduling

IEEE Transactions on Parallel and Distributed Systems
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Performance of multithreaded chip multiprocessors and implications for operating system design

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
Performance-driven processor allocation

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Analysis of dynamic voltage/frequency scaling in chip-multiprocessors

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Optimizing thread throughput for multithreaded workloads on memory constrained CMPs

Proceedings of the 5th conference on Computing frontiers
PAM: a novel performance/power aware meta-scheduler for multi-core systems

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A dynamic scheduler for balancing HPC applications

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Identifying energy-efficient concurrency levels using machine learning

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture
Contentiousness vs. sensitivity: improving contention aware runtime systems on multicore architectures

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Adapt or become extinct!: the case for a unified framework for deployment-time optimization (position paper)

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Chameleon: operating system support for dynamic processors

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Courteous cache sharing: being nice to others in capacity management

Proceedings of the 49th Annual Design Automation Conference
Compiling for niceness: mitigating contention for QoS in warehouse scale computers

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Scalability-based manycore partitioning

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Resource-freeing attacks: improve your cloud performance (at your neighbor's expense)

Proceedings of the 2012 ACM conference on Computer and communications security
Measuring interference between live datacenter applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ADAPT: A framework for coscheduling multithreaded programs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Reducing the energy cost of computing through efficient co-scheduling of parallel workloads

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Coordinated power-performance optimization in manycores

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
ReSense: Mapping dynamic workloads of colocated multithreaded applications using resource sensitivity

ACM Transactions on Architecture and Code Optimization (TACO)
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures

Proceedings of Programming Models and Applications on Multicores and Manycores
Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop real-time scheduling techniques for improving performance and energy for multiprogrammed workloads that scale non-uniformly with increasing thread counts. Multithreaded programs generally deliver higher throughput than single-threaded programs on chip multiprocessors, but performance gains from increasing threads decrease when there is contention for shared resources. We use analytic metrics to derive local search heuristics for creating efficient multiprogrammed, multithreaded workload schedules. Programs are allocated fewer cores than requested, and scheduled to space-share the CMP to improve global throughput. Our holistic approach attempts to co-schedule programs that complement each other with respect to shared resource consumption. We find application co-scheduling for performance and energy in a resource-aware manner achieves better results than solely targeting total throughput or concurrently co-scheduling all programs. Our schedulers improve overall energy delay (E*D) by a factor of 1.5 over time-multiplexed gang scheduling.