Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip

  • Authors:
  • Boris Grot;Stephen W. Keckler;Onur Mutlu

  • Affiliations:
  • The University of Texas at Austin;The University of Texas at Austin;Carnegie Mellon University

  • Venue:
  • Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Future many-core chip multiprocessors (CMPs) and systems-on-a-chip (SOCs) will have numerous processing elements executing multiple applications concurrently. These applications and their respective threads will interfere at the on-chip network level and compete for shared resources such as cache banks, memory controllers, and specialized accelerators. Often, the communication and sharing patterns of these applications will be impossible to predict off-line, making fairness guarantees and performance isolation difficult through static thread and link scheduling. Prior techniques for providing network quality-of-service (QOS) have too much algorithmic complexity, cost (area and/or energy) or performance overhead to be attractive for on-chip implementation. To better understand the preferred solution space, we define desirable features and evaluation metrics for QOS in a network-on-a-chip (NOC). Our insights lead us to propose a novel QOS system called Preemptive Virtual Clock (PVC). PVC provides strong guarantees, reduces packet delay variation, and enables efficient reclamation of idle network bandwidth without per-flow buffering at the routers and with minimal buffering at the source nodes. PVC averts priority inversion through preemption of lower-priority packets. By controlling preemption aggressiveness, PVC enables a trade-off between the strength of the guarantees and overall throughput. Finally, PVC simplifies network management through a flexible allocation mechanism that enables per-application bandwidth provisioning independent of thread count and supports transparent bandwidth recycling among an application's threads.