Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches
IEEE Transactions on Computers
Traffic analysis for on-chip networks design of multimedia applications
Proceedings of the 39th annual Design Automation Conference
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
aSOC: A Scalable, Single-Chip Communications Architecture
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Networks on Silicon: Combining Best-Effort and Guaranteed Services
Proceedings of the conference on Design, automation and test in Europe
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
Proceedings of the conference on Design, automation and test in Europe - Volume 2
QNoC: QoS architecture and design process for network on chip
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Networks on chip
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Evaluation of queue designs for true fully adaptive routers
Journal of Parallel and Distributed Computing
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Load Distribution with the Proximity Congestion Awareness in a Network on Chip
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
An architecture and compiler for scalable on-chip communication
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
An Asynchronous Router for Multiple Service Levels Networks on Chip
ASYNC '05 Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems
An Asynchronous NOC Architecture Providing Low Latency Service and Its Multi-Level Design Framework
ASYNC '05 Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A Quality-of-Service Mechanism for Interconnection Networks in System-on-Chips
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Æthereal Network on Chip: Concepts, Architectures, and Implementations
IEEE Design & Test
A survey of research and practices of Network-on-chip
ACM Computing Surveys (CSUR)
Proceedings of the 43rd annual Design Automation Conference
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Congestion-controlled best-effort communication for networks-on-chip
Proceedings of the conference on Design, automation and test in Europe
Introducing the SuperGT network-on-chip: SuperGT QoS: more than just GT
Proceedings of the 44th annual Design Automation Conference
Analysis and optimization of prediction-based flow control in networks-on-chip
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A dynamically-allocated virtual channel architecture with congestion awareness for on-chip routers
Proceedings of the 45th annual Design Automation Conference
Adaptive insertion policies for managing shared caches
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Proceedings of the 36th annual international symposium on Computer architecture
Achieving predictable performance through better memory controller placement in many-core CMPs
Proceedings of the 36th annual international symposium on Computer architecture
Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Complexity effective memory access scheduling for many-core accelerator architectures
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Application-aware prioritization mechanisms for on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
SHARP control: controlled shared cache management in chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
High performance cache replacement using re-reference interval prediction (RRIP)
Proceedings of the 37th annual international symposium on Computer architecture
Aérgia: exploiting packet latency slack in on-chip networks
Proceedings of the 37th annual international symposium on Computer architecture
aelite: a flit-synchronous network on chip with composable and predictable services
Proceedings of the Conference on Design, Automation and Test in Europe
ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing
FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
Throughput-Effective On-Chip Networks for Manycore Accelerators
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A case for heterogeneous on-chip interconnects for CMPs
Proceedings of the 38th annual international symposium on Computer architecture
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees
Proceedings of the 38th annual international symposium on Computer architecture
Reducing memory interference in multicore systems via application-aware memory channel partitioning
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture
HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
CPU-assisted GPGPU on fused CPU-GPU architectures
HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Exploring NoC Virtualization Alternatives in CMPs
PDP '12 Proceedings of the 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing
A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC
Proceedings of the 49th Annual Design Automation Conference
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
Virtualizing Virtual Channels for Increased Network-on-Chip Robustness and Upgradeability
ISVLSI '12 Proceedings of the 2012 IEEE Computer Society Annual Symposium on VLSI
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks
SBAC-PAD '12 Proceedings of the 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing
Hi-index | 0.00 |
Current heterogeneous chip-multiprocessors (CMPs) integrate a GPU architecture on a die. However, the heterogeneity of this architecture inevitably exerts different pressures on shared resource management due to differing characteristics of CPU and GPU cores. We consider how to efficiently share on-chip resources between cores within the heterogeneous system, in particular the on-chip network. Heterogeneous architectures use an on-chip interconnection network to access shared resources such as last-level cache tiles and memory controllers, and this type of on-chip network will have a significant impact on performance. In this article, we propose a feedback-directed virtual channel partitioning (VCP) mechanism for on-chip routers to effectively share network bandwidth between CPU and GPU cores in a heterogeneous architecture. VCP dedicates a few virtual channels to CPU and GPU applications with separate injection queues. The proposed mechanism balances on-chip network bandwidth for applications running on CPU and GPU cores by adaptively choosing the best partitioning configuration. As a result, our mechanism improves system throughput by 15% over the baseline across 39 heterogeneous workloads.