Application-aware prioritization mechanisms for on-chip networks

Authors:
Reetuparna Das;Onur Mutlu;Thomas Moscibroda;Chita R. Das
Affiliations:
Pennsylvania State University;Carnegie Mellon University;Microsoft Research;Pennsylvania State University
Venue:
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2009

Citing 28
Cited 34

Analysis and simulation of a fair queueing algorithm

SIGCOMM '89 Symposium proceedings on Communications architectures & protocols
Virtual clock: a new traffic control algorithm for packet switching networks

SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A comparative analysis of disk scheduling policies

Communications of the ACM
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
QoS provisioning in clusters: an investigation of Router and NIC design

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Clustering Algorithms

Clustering Algorithms
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
QNoC: QoS architecture and design process for network on chip

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Networks on chip
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Trade Offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance

IEEE Micro
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A novel dimensionally-decomposed router for on-chip communication in 3D architectures

Proceedings of the 34th annual international symposium on Computer architecture
Express virtual channels: towards the ideal interconnection fabric

Proceedings of the 34th annual international symposium on Computer architecture
The Power of Priority: NoC Based Distributed Cache Coherency

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Flattened Butterfly Topology for On-Chip Networks

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
System-Level Performance Metrics for Multiprogram Workloads

IEEE Micro
An efficient algorithm for exploiting multiple arithmetic units

IBM Journal of Research and Development

Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Aérgia: exploiting packet latency slack in on-chip networks

Proceedings of the 37th annual international symposium on Computer architecture
Next generation on-chip networks: what kind of congestion control do we need?

Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
Contention-Aware Scheduling on Multicore Systems

ACM Transactions on Computer Systems (TOCS)
Thread criticality support in on-chip networks

Proceedings of the Third International Workshop on Network on Chip Architectures
Netrace: dependency-driven trace-based network-on-chip simulation

Proceedings of the Third International Workshop on Network on Chip Architectures
LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Characterizing the impact of process variation on 45 nm NoC-based CMPs

Journal of Parallel and Distributed Computing
CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs

Journal of Parallel and Distributed Computing
Memory systems in the many-core era: challenges, opportunities, and solution directions

Proceedings of the international symposium on Memory management
Prefetch-aware shared resource management for multi-core systems

Proceedings of the 38th annual international symposium on Computer architecture
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems

ACM Transactions on Computer Systems (TOCS)
Topology-Aware quality-of-service support in highly integrated chip multiprocessors

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems

Proceedings of the 39th Annual International Symposium on Computer Architecture
On-chip networks from a networking perspective: congestion and scalability in many-core interconnects

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Dynamic QoS management for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Application-aware prefetch prioritization in on-chip networks

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Application-to-core mapping policies to reduce memory interference in multi-core systems

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
On-chip networks from a networking perspective: congestion and scalability in many-core interconnects

ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Toward on-chip datacenters: a perspective on general trends and on-chip particulars

The Journal of Supercomputing
Cost-effective contention avoidance in a CMP with shared memory controllers

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Moths: Mobile threads for on-chip networks

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Addressing End-to-End Memory Access Latency in NoC-Based Multicores

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Proactive aging management in heterogeneous NoCs through a criticality-driven routing approach

Proceedings of the Conference on Design, Automation and Test in Europe
Utility-based acceleration of multithreaded applications on asymmetric CMPs

Proceedings of the 40th Annual International Symposium on Computer Architecture
Catnap: energy proportional multiple network-on-chip

Proceedings of the 40th Annual International Symposium on Computer Architecture
A heterogeneous multiple network-on-chip design: an application-aware approach

Proceedings of the 50th Annual Design Automation Conference
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Enhancing both fairness and performance using rate-aware dynamic storage cache partitioning

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture

Journal of Parallel and Distributed Computing
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
REF: resource elasticity fairness with sharing incentives for multiprocessors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
On-chip traffic regulation to reduce coherence protocol cost on a microthreaded many-core architecture with distributed caches

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
PAIS: Parallelism-aware interconnect scheduling in multicores

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Network-on-Chips (NoCs) are likely to become a critical shared resource in future many-core processors. The challenge is to develop policies and mechanisms that enable multiple applications to efficiently and fairly share the network, to improve system performance. Existing local packet scheduling policies in the routers fail to fully achieve this goal, because they treat every packet equally, regardless of which application issued the packet. This paper proposes prioritization policies and architectural extensions to NoC routers that improve the overall application-level throughput, while ensuring fairness in the network. Our prioritization policies are application-aware, distinguishing applications based on the stall-time criticality of their packets. The idea is to divide processor execution time into phases, rank applications within a phase based on stall-time criticality, and have all routers in the network prioritize packets based on their applications' ranks. Our scheme also includes techniques that ensure starvation freedom and enable the enforcement of system-level application priorities. We evaluate the proposed prioritization policies on a 64-core CMP with an 8x8 mesh NoC, using a suite of 35 diverse applications. For a representative set of case studies, our proposed policy increases average system throughput by 25.6% over age-based arbitration and 18.4% over round-robin arbitration. Averaged over 96 randomly-generated multiprogrammed workload mixes, the proposed policy improves system throughput by 9.1% over the best existing prioritization policy, while also reducing application-level unfairness.