The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding
IEEE Transactions on Parallel and Distributed Systems
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A new switch chip for IBM RS/6000 SP systems
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Multi-address Encoding for Multicast
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Characterizing and predicting value degree of use
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Interconnect intellectual property for network-on-chip (NoC)
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Networks on chip
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking
Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Connection-oriented Multicasting in Wormhole-switched Networks on Chip
ISVLSI '06 Proceedings of the IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures
Instruction scheduling for a tiled dataflow architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Anton, a special-purpose machine for molecular dynamics simulation
Proceedings of the 34th annual international symposium on Computer architecture
Virtual hierarchies to support server consolidation
Proceedings of the 34th annual international symposium on Computer architecture
Express virtual channels: towards the ideal interconnection fabric
Proceedings of the 34th annual international symposium on Computer architecture
The AMD Opteron Northbridge Architecture
IEEE Micro
A Domain-Specific On-Chip Network Design for Large Scale Cache Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
An Evaluation of Server Consolidation Workloads for Multi-Core Designs
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Corona: System Implications of Emerging Nanophotonic Technology
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Token tenure: PATCHing token counting using directory-based cache coherence
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Efficient unicast and multicast support for CMPs
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Power reduction of CMP communication networks via RF-interconnects
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Multicast routing with dynamic packet fragmentation
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Application-aware deadlock-free oblivious routing
Proceedings of the 36th annual international symposium on Computer architecture
Phastlane: a rapid transit optical routing network
Proceedings of the 36th annual international symposium on Computer architecture
A high-performance low-power nanophotonic on-chip network
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Recursive partitioning multicast: A bandwidth-efficient routing for Networks-on-Chip
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Spectrum: a hybrid nanophotonic-electric on-chip network
Proceedings of the 46th Annual Design Automation Conference
Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
In-network coherence filtering: snoopy coherence without broadcasts
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Token tenure and PATCH: A predictive/adaptive token-counting hybrid
ACM Transactions on Architecture and Code Optimization (TACO)
Efficient lookahead routing and header compression for multicasting in networks-on-chip
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
A generic adaptive path-based routing method for MPSoCs
Journal of Systems Architecture: the EUROMICRO Journal
Microprocessors & Microsystems
Power-efficient tree-based multicast support for networks-on-chip
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Iris: A hybrid nanophotonic network design for high-performance and low-power on-chip communication
ACM Journal on Emerging Technologies in Computing Systems (JETC)
A low-latency, high-throughput on-chip optical router architecture for future chip multiprocessors
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Exploring partitioning methods for 3D Networks-on-Chip utilizing adaptive routing model
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Inferring packet dependencies to improve trace based simulation of on-chip networks
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks
Proceedings of the 38th annual international symposium on Computer architecture
Switch-based packing technique to reduce traffic and latency in token coherence
Journal of Parallel and Distributed Computing
Hardware support for OpenMP collective operations
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
TransCom: transforming stream communication for load balance and efficiency in networks-on-chip
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Improving coherence protocol reactiveness by trading bandwidth for latency
Proceedings of the 9th conference on Computing Frontiers
Refinement-Based modeling of 3d nocs
FSEN'11 Proceedings of the 4th IPM international conference on Fundamentals of Software Engineering
R-NoC: an efficient packet-switched reconfigurable networks-on-chip
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Microprocessors & Microsystems
LIGERO: A light but efficient router conceived for cache-coherent chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
The locality-aware adaptive cache coherence protocol
Proceedings of the 40th Annual International Symposium on Computer Architecture
An efficient, low-cost routing framework for convex mesh partitions to support virtualization
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
On multicast for dynamic and irregular on-chip networks using dynamic programming method
Proceedings of the Sixth International Workshop on Network on Chip Architectures
Efficient multicast schemes for 3-D Networks-on-Chip
Journal of Systems Architecture: the EUROMICRO Journal
VBON: Toward efficient on-chip networks via hierarchical virtual bus
Microprocessors & Microsystems
Dual partitioning multicasting for high-performance on-chip networks
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Current state-of-the-art on-chip networks provide efficiency, high throughput, and low latency for one-to-one (unicast) traffic. The presence of one-to-many (multicast) or one-to-all (broadcast) traffic can significantly degrade the performance of these designs, since they rely on multiple unicasts to provide one-to-many communication. This results in a burst of packets from a single source and is a very inefficient way of performing multicast and broadcast communication. This inefficiency is compounded by the proliferation of architectures and coherence protocols that require multicast and broadcast communication. In this paper, we characterize a wide array of on-chip communication scenarios that benefit from hardware multicast support. We propose Virtual Circuit Tree Multicasting (VCTM) and present a detailed multicast router design that improves network performance by up to 90\% while reducing network activity (hence power) by up to 53%.Our VCTM router is flexible enough to improve interconnect performance for a broad spectrum of multicasting scenarios,and achieves these benefits with straightforward and inexpensive extensions to a state-of-the-art packet-switched router.