Analysis and simulation of a fair queueing algorithm
SIGCOMM '89 Symposium proceedings on Communications architectures & protocols
Virtual clock: a new traffic control algorithm for packet switching networks
SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
Load latency tolerance in dynamically scheduled processors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A comparative analysis of disk scheduling policies
Communications of the ACM
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Focusing processor policies via critical-path prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
QoS provisioning in clusters: an investigation of Router and NIC design
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Clustering Algorithms
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
QNoC: QoS architecture and design process for network on chip
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Networks on chip
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Case for MLP-Aware Cache Replacement
Proceedings of the 33rd annual international symposium on Computer Architecture
ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A novel dimensionally-decomposed router for on-chip communication in 3D architectures
Proceedings of the 34th annual international symposium on Computer architecture
Express virtual channels: towards the ideal interconnection fabric
Proceedings of the 34th annual international symposium on Computer architecture
The Power of Priority: NoC Based Distributed Cache Coherency
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Flattened Butterfly Topology for On-Chip Networks
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
An efficient algorithm for exploiting multiple arithmetic units
IBM Journal of Research and Development
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Aérgia: exploiting packet latency slack in on-chip networks
Proceedings of the 37th annual international symposium on Computer architecture
Next generation on-chip networks: what kind of congestion control do we need?
Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
Contention-Aware Scheduling on Multicore Systems
ACM Transactions on Computer Systems (TOCS)
Thread criticality support in on-chip networks
Proceedings of the Third International Workshop on Network on Chip Architectures
Netrace: dependency-driven trace-based network-on-chip simulation
Proceedings of the Third International Workshop on Network on Chip Architectures
LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Characterizing the impact of process variation on 45 nm NoC-based CMPs
Journal of Parallel and Distributed Computing
CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs
Journal of Parallel and Distributed Computing
Memory systems in the many-core era: challenges, opportunities, and solution directions
Proceedings of the international symposium on Memory management
Prefetch-aware shared resource management for multi-core systems
Proceedings of the 38th annual international symposium on Computer architecture
ACM Transactions on Computer Systems (TOCS)
Topology-Aware quality-of-service support in highly integrated chip multiprocessors
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Dynamic QoS management for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Application-aware prefetch prioritization in on-chip networks
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Application-to-core mapping policies to reduce memory interference in multi-core systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Toward on-chip datacenters: a perspective on general trends and on-chip particulars
The Journal of Supercomputing
Cost-effective contention avoidance in a CMP with shared memory controllers
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Moths: Mobile threads for on-chip networks
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Addressing End-to-End Memory Access Latency in NoC-Based Multicores
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Proactive aging management in heterogeneous NoCs through a criticality-driven routing approach
Proceedings of the Conference on Design, Automation and Test in Europe
Utility-based acceleration of multithreaded applications on asymmetric CMPs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Catnap: energy proportional multiple network-on-chip
Proceedings of the 40th Annual International Symposium on Computer Architecture
A heterogeneous multiple network-on-chip design: an application-aware approach
Proceedings of the 50th Annual Design Automation Conference
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Enhancing both fairness and performance using rate-aware dynamic storage cache partitioning
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture
Journal of Parallel and Distributed Computing
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
REF: resource elasticity fairness with sharing incentives for multiprocessors
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
PAIS: Parallelism-aware interconnect scheduling in multicores
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
Network-on-Chips (NoCs) are likely to become a critical shared resource in future many-core processors. The challenge is to develop policies and mechanisms that enable multiple applications to efficiently and fairly share the network, to improve system performance. Existing local packet scheduling policies in the routers fail to fully achieve this goal, because they treat every packet equally, regardless of which application issued the packet. This paper proposes prioritization policies and architectural extensions to NoC routers that improve the overall application-level throughput, while ensuring fairness in the network. Our prioritization policies are application-aware, distinguishing applications based on the stall-time criticality of their packets. The idea is to divide processor execution time into phases, rank applications within a phase based on stall-time criticality, and have all routers in the network prioritize packets based on their applications' ranks. Our scheme also includes techniques that ensure starvation freedom and enable the enforcement of system-level application priorities. We evaluate the proposed prioritization policies on a 64-core CMP with an 8x8 mesh NoC, using a suite of 35 diverse applications. For a representative set of case studies, our proposed policy increases average system throughput by 25.6% over age-based arbitration and 18.4% over round-robin arbitration. Averaged over 96 randomly-generated multiprogrammed workload mixes, the proposed policy improves system throughput by 9.1% over the best existing prioritization policy, while also reducing application-level unfairness.