Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures

Authors:
Jaekyu Lee;Si Li;Hyesoon Kim;Sudhakar Yalamanchili
Affiliations:
Georgia Institute of Technology;Georgia Institute of Technology;Georgia Institute of Technology;Georgia Institute of Technology
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Year:
2013

Citing 62
Cited 0

Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches

IEEE Transactions on Computers
Traffic analysis for on-chip networks design of multimedia applications

Proceedings of the 39th annual Design Automation Conference
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
aSOC: A Scalable, Single-Chip Communications Architecture

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Networks on Silicon: Combining Best-Effort and Guaranteed Services

Proceedings of the conference on Design, automation and test in Europe
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
Guaranteed Bandwidth Using Looped Containers in Temporally Disjoint Networks within the Nostrum Network on Chip

Proceedings of the conference on Design, automation and test in Europe - Volume 2
QNoC: QoS architecture and design process for network on chip

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Networks on chip
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Evaluation of queue designs for true fully adaptive routers

Journal of Parallel and Distributed Computing
Trade Offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Load Distribution with the Proximity Congestion Awareness in a Network on Chip

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
An architecture and compiler for scalable on-chip communication

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A New Scalable and Cost-Effective Congestion Management Strategy for Lossless Multistage Interconnection Networks

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
An Asynchronous Router for Multiple Service Levels Networks on Chip

ASYNC '05 Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems
An Asynchronous NOC Architecture Providing Low Latency Service and Its Multi-Level Design Framework

ASYNC '05 Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems
A Design Flow for Application-Specific Networks on Chip with Guaranteed Performance to Accelerate SOC Design and Verification

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A Router Architecture for Connection-Oriented Service Guarantees in the MANGO Clockless Network-on-Chip

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A Quality-of-Service Mechanism for Interconnection Networks in System-on-Chips

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Æthereal Network on Chip: Concepts, Architectures, and Implementations

IEEE Design & Test
A survey of research and practices of Network-on-chip

ACM Computing Surveys (CSUR)
Optimal link scheduling on improving best-effort and guaranteed services performance in network-on-chip systems

Proceedings of the 43rd annual Design Automation Conference
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Congestion-controlled best-effort communication for networks-on-chip

Proceedings of the conference on Design, automation and test in Europe
Introducing the SuperGT network-on-chip: SuperGT QoS: more than just GT

Proceedings of the 44th annual Design Automation Conference
Analysis and optimization of prediction-based flow control in networks-on-chip

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A dynamically-allocated virtual channel architecture with congestion awareness for on-chip routers

Proceedings of the 45th annual Design Automation Conference
Adaptive insertion policies for managing shared caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
Achieving predictable performance through better memory controller placement in many-core CMPs

Proceedings of the 36th annual international symposium on Computer architecture
Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Complexity effective memory access scheduling for many-core accelerator architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Application-aware prioritization mechanisms for on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
SHARP control: controlled shared cache management in chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
Aérgia: exploiting packet latency slack in on-chip networks

Proceedings of the 37th annual international symposium on Computer architecture
aelite: a flit-synchronous network on chip with composable and predictable services

Proceedings of the Conference on Design, Automation and Test in Europe
ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing

FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
Throughput-Effective On-Chip Networks for Manycore Accelerators

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A case for heterogeneous on-chip interconnects for CMPs

Proceedings of the 38th annual international symposium on Computer architecture
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees

Proceedings of the 38th annual international symposium on Computer architecture
Reducing memory interference in multicore systems via application-aware memory channel partitioning

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
CPU-assisted GPGPU on fused CPU-GPU architectures

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Exploring NoC Virtualization Alternatives in CMPs

PDP '12 Proceedings of the 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing
A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC

Proceedings of the 49th Annual Design Automation Conference
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems

Proceedings of the 39th Annual International Symposium on Computer Architecture
Virtualizing Virtual Channels for Increased Network-on-Chip Robustness and Upgradeability

ISVLSI '12 Proceedings of the 2012 IEEE Computer Society Annual Symposium on VLSI
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks

SBAC-PAD '12 Proceedings of the 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current heterogeneous chip-multiprocessors (CMPs) integrate a GPU architecture on a die. However, the heterogeneity of this architecture inevitably exerts different pressures on shared resource management due to differing characteristics of CPU and GPU cores. We consider how to efficiently share on-chip resources between cores within the heterogeneous system, in particular the on-chip network. Heterogeneous architectures use an on-chip interconnection network to access shared resources such as last-level cache tiles and memory controllers, and this type of on-chip network will have a significant impact on performance. In this article, we propose a feedback-directed virtual channel partitioning (VCP) mechanism for on-chip routers to effectively share network bandwidth between CPU and GPU cores in a heterogeneous architecture. VCP dedicates a few virtual channels to CPU and GPU applications with separate injection queues. The proposed mechanism balances on-chip network bandwidth for applications running on CPU and GPU cores by adaptively choosing the best partitioning configuration. As a result, our mechanism improves system throughput by 15% over the baseline across 39 heterogeneous workloads.