Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip

Authors:
Boris Grot;Stephen W. Keckler;Onur Mutlu
Affiliations:
The University of Texas at Austin;The University of Texas at Austin;Carnegie Mellon University
Venue:
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2009

Citing 17
Cited 27

Analysis and simulation of a fair queueing algorithm

SIGCOMM '89 Symposium proceedings on Communications architectures & protocols
Virtual clock: a new traffic control algorithm for packet switching networks

SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
Rotating combined queueing (RCQ): bandwidth and latency guarantees in low-cost, high-performance networks

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Design and analysis of frame-based fair queueing: a new traffic scheduling algorithm for packet-switched networks

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Throttle and Preempt: A New Flow Control for Real-Time Communications in Wormhole Networks

ICPP '97 Proceedings of the international Conference on Parallel Processing
Supporting Preemption in Wormhole Networks

COMPSAC '99 23rd International Computer Software and Applications Conference
A Delay Model and Speculative Architecture for Pipelined Routers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
The M5 Simulator: Modeling Networked Systems

IEEE Micro
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
Research Challenges for On-Chip Interconnection Networks

IEEE Micro
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration

Proceedings of the Conference on Design, Automation and Test in Europe

Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Aérgia: exploiting packet latency slack in on-chip networks

Proceedings of the 37th annual international symposium on Computer architecture
Approximating age-based arbitration in on-chip networks

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Contention-Aware Scheduling on Multicore Systems

ACM Transactions on Computer Systems (TOCS)
Thread criticality support in on-chip networks

Proceedings of the Third International Workshop on Network on Chip Architectures
Netrace: dependency-driven trace-based network-on-chip simulation

Proceedings of the Third International Workshop on Network on Chip Architectures
LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Enabling quality-of-service in nanophotonic network-on-chip

Proceedings of the 16th Asia and South Pacific Design Automation Conference
CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs

Journal of Parallel and Distributed Computing
Prefetch-aware shared resource management for multi-core systems

Proceedings of the 38th annual international symposium on Computer architecture
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees

Proceedings of the 38th annual international symposium on Computer architecture
FeatherWeight: low-cost optical arbitration with QoS support

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems

ACM Transactions on Computer Systems (TOCS)
Topology-Aware quality-of-service support in highly integrated chip multiprocessors

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
On-chip networks from a networking perspective: congestion and scalability in many-core interconnects

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Dynamic QoS management for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Globally Synchronized Frames for guaranteed quality-of-service in on-chip networks

Journal of Parallel and Distributed Computing
Application-aware prefetch prioritization in on-chip networks

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Application-to-core mapping policies to reduce memory interference in multi-core systems

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
On-chip networks from a networking perspective: congestion and scalability in many-core interconnects

ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Toward on-chip datacenters: a perspective on general trends and on-chip particulars

The Journal of Supercomputing
Cost-effective contention avoidance in a CMP with shared memory controllers

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Moths: Mobile threads for on-chip networks

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
SurfNoC: a low latency and provably non-interfering approach to secure networks-on-chip

Proceedings of the 40th Annual International Symposium on Computer Architecture
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Providing multiple hard latency and throughput guarantees for packet switching networks on chip

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Future many-core chip multiprocessors (CMPs) and systems-on-a-chip (SOCs) will have numerous processing elements executing multiple applications concurrently. These applications and their respective threads will interfere at the on-chip network level and compete for shared resources such as cache banks, memory controllers, and specialized accelerators. Often, the communication and sharing patterns of these applications will be impossible to predict off-line, making fairness guarantees and performance isolation difficult through static thread and link scheduling. Prior techniques for providing network quality-of-service (QOS) have too much algorithmic complexity, cost (area and/or energy) or performance overhead to be attractive for on-chip implementation. To better understand the preferred solution space, we define desirable features and evaluation metrics for QOS in a network-on-a-chip (NOC). Our insights lead us to propose a novel QOS system called Preemptive Virtual Clock (PVC). PVC provides strong guarantees, reduces packet delay variation, and enables efficient reclamation of idle network bandwidth without per-flow buffering at the routers and with minimal buffering at the source nodes. PVC averts priority inversion through preemption of lower-priority packets. By controlling preemption aggressiveness, PVC enables a trade-off between the strength of the guarantees and overall throughput. Finally, PVC simplifies network management through a flexible allocation mechanism that enables per-application bandwidth provisioning independent of thread count and supports transparent bandwidth recycling among an application's threads.