Fast and fair: data-stream quality of service

Authors:
Thomas Y. Yeh;Glenn Reinman
Affiliations:
University of California, Los Angeles, CA;University of California, Los Angeles, CA
Venue:
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Year:
2005

Citing 24
Cited 18

Introduction to algorithms

Introduction to algorithms
Optimal Partitioning of Cache Memory

IEEE Transactions on Computers
Set-associative cache simulation using generalized binomial trees

ACM Transactions on Computer Systems (TOCS)
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Managing multi-configuration hardware via dynamic working set analysis

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Exploring the Design Space of Future CMPs

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Phase tracking and prediction

Proceedings of the 30th annual international symposium on Computer architecture
Evaluation of a Multithreaded Architecture for Cellular Computing

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
Design space exploration of caches using compressed traces

Proceedings of the 18th annual international conference on Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Exploiting the Cache Capacity of a Single-Chip Multi-Core Processor with Execution Migration

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture

Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
Towards hybrid last level caches for chip-multiprocessors

ACM SIGARCH Computer Architecture News
A novel migration-based NUCA design for chip multiprocessors

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
FlexDCP: a QoS framework for CMP architectures

ACM SIGOPS Operating Systems Review
Service level agreement for multithreaded processors

ACM Transactions on Architecture and Code Optimization (TACO)
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
A case for integrated processor-cache partitioning in chip multiprocessors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Load balancing using dynamic cache allocation

Proceedings of the 7th ACM international conference on Computing frontiers
Quality of service shared cache management in chip multiprocessor architecture

ACM Transactions on Architecture and Code Optimization (TACO)
CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs

Journal of Parallel and Distributed Computing
Dynamic QoS management for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Fair CPU time accounting in CMP+SMT processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip multiprocessors have the potential to exploit thread level parallelism, particularly in the context of embedded server farms where the available number of threads can be quite high. Unfortunately, both per-core and overall throughput are significantly impacted by the organization of the lowest level on-chip cache. On-chip caches for CMPs must be able to handle the increased demand and contention of multiple cores. To complicate the problem, cache demand changes dynamically with phases changes, context switches, power saving features, and assignments to asymmetric cores.We propose PDAS, a distributed NUCA L2 cache design with an adaptive sharing mechanism. Each core independently measures its dynamic need, and all cache resources are managed to increase utilization, reduce migrations, and lower interference. Per-core performance degradation is bounded while overall throughput is optimized, thus qualitatively improving performance of embedded systems where quality-of-service is an important characteristic.In single thread mode, PDAS, on average, improves by 26%, 27%, and 13% over Private, Shared, and NUCA caches respectively. This improvement is achieved while reducing internal migrations on average by 82% as compared to the NUCA. With thread contention, PDAS increases its performance and power advantage over prior work. The average migration reduction over NUCA increases to over 90%, and average IPC improvements over NUCA are 30%, 14%, and 35% for 2T, 3T, and 4T scenarios.