Custom networks-on-chip architectures with multicast routing

  • Authors:
  • Shan Yan;Bill Lin

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA;Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA

  • Venue:
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we consider the problem of synthesizing custom networks-on-chip (NoC) architectures that are optimized for a given application. We consider both unicast and multicast traffic flows in the input specification. Multicast traffic flows are used in a variety of applications, and their direct support with only replication of packets at optimal bifurcation points rather than full end-to-end replication can significantly reduce network contention and resource requirements. Our problem formulation is based on the decomposition of the problem into the inter-related steps of finding good flow partitions, deriving a good physical network topology for each group in the partition, and providing an optimized network implementation for the derived topologies. Our solutions may be comprised of multiple custom networks, each interconnecting a subset of communicating modules. We propose several algorithms that can systematically examine different flow partitions, and we propose Rectilinear-Steiner-Tree (RST)-based algorithms for generating efficient network topologies. Our design flow integrates floorplanning, and our solutions consider deadlock-free routing. Experimental results on a variety of NoC benchmarks showed that our synthesis results can on average achieve a 4.82 times reduction in power consumption over different mesh implementations on unicast benchmarks and a 1.92 times reduction in power consumption on multicast benchmarks. Significant improvements in performance were also achieved, with an average of 2.92 times reduction in hop count on unicast benchmarks and 1.82 times reduction in hop count on multicast benchmarks. To further gauge the effectiveness of our heuristic algorithms, we also implemented an exact algorithm that enumerates all distinct set partitions. For the benchmarks where exact results could be obtained, our algorithms on average can achieve results within 3 % of exact results, but with much shorter execution times.