The turn model for adaptive routing
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol
IEEE Transactions on Parallel and Distributed Systems
Shift Register Sequences
IEEE Transactions on Parallel and Distributed Systems
Multi-address Encoding for Multicast
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
A large scale, homogeneous, fully distributed parallel machine, I
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Exploring Virtual Network Selection Algorithms in DSM Cache Coherence Protocols
IEEE Transactions on Parallel and Distributed Systems
Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks
Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Connection-oriented Multicasting in Wormhole-switched Networks on Chip
ISVLSI '06 Proceedings of the IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures
Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Efficient unicast and multicast support for CMPs
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Recursive partitioning multicast: A bandwidth-efficient routing for Networks-on-Chip
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Trace-driven optimization of networks-on-chip configurations
Proceedings of the 47th Design Automation Conference
Group-caching for NoC based multicore cache coherent systems
Proceedings of the Conference on Design, Automation and Test in Europe
LADPM: Latency-Aware Dual-Partition Multicast Routing for Mesh-Based Network-on-Chips
ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
A new theory of deadlock-free adaptive multicast routing in wormhole networks
SPDP '93 Proceedings of the 1993 5th IEEE Symposium on Parallel and Distributed Processing
Supporting efficient collective communication in NoCs
HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
New performance-driven FPGA routing algorithms
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
NOCS '12 Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip
Hi-index | 0.00 |
As the number of cores integrated onto a single chip increases, power dissipation and network latency become ever-increasingly stringent. On-chip network provides an efficient and scalable interconnection paradigm for chip multiprocessors (CMPs), wherein one-to-many (multicast) communication is universal for such platforms. Without efficient multicasting support, traditional unicasting on-chip networks will be low efficiency in tackling such multicast communication. In this paper, we propose Dual Partitioning Multicasting (DPM) to reduce packet latency and balance network resource utilization. Specifically, DPM scheme adaptively makes routing decisions based on the network load-balance level as well as the link sharing patterns characterized by the distribution of the multicasting destinations. Extensive experimental results for synthetic traffic as well as real applications show that compared with the recently proposed RPM scheme, DPM significantly reduces the average packet latency and mitigates the network power consumption. More importantly, DPM is highly scalable for future on-chip networks with heavy traffic load and varieties of traffic patterns.