No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips

Authors:
Shu-Hsuan Chou;Chien-Chih Chen;Chi-Neng Wen;Yi-Chao Chan;Tien-Fu Chen;Chao-Ching Wang;Jinn-Shyan Wang
Affiliations:
National Chung Cheng University, Taiwan, R.O.C.;National Chung Cheng University, Taiwan, R.O.C.;National Chung Cheng University, Taiwan, R.O.C.;National Chung Cheng University, Taiwan, R.O.C.;National Chung Cheng University, Taiwan, R.O.C.;National Chung Cheng University, Taiwan, R.O.C.;National Chung Cheng University, Taiwan, R.O.C.
Venue:
Proceedings of the 46th Annual Design Automation Conference
Year:
2009

Citing 10
Cited 3

The Augmint multiprocessor simulation toolkit for Intel x86 architectures

ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
A low latency router supporting adaptivity for on-chip interconnects

Proceedings of the 42nd annual Design Automation Conference
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory

Proceedings of the 33rd annual international symposium on Computer Architecture
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
A Hybrid Ring/Mesh Interconnect for Network-on-Chip Using Hierarchical Rings for Global Routing

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Tailoring circuit-switched network-on-chip to application-specific system-on-chip by two optimization schemes

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Processor Design in 3D Die-Stacking Technologies

IEEE Micro
An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
3-D topologies for networks-on-chip

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Hierarchical circuit-switched NoC for multicore video processing

Microprocessors & Microsystems
Replacement techniques for dynamic NUCA cache designs on CMPs

The Journal of Supercomputing
LP-NUCA: networks-in-cache for high-performance low-power embedded processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Consistent with the trend towards the use of many cores in SOC and 3D Chip techniques, this paper proposes a "single-cycle ring" interconnection (SC_Ring) with ultra-low latency and minimal complexity. The proposed SC_Ring allows multiple single-cycle transactions in parallel. The main features of the circuit-switched design include a set of 3-ported circuit-switched routers (4~16) and a performance/timing effective arbiter. The arbiter, called "BTPC", features single-cycle arbitration and routing-control by means of the novel Binary-Tree paths convergence and path-prediction mechanisms, to provide a highly reduced time complexity. By combining this with the integration of 3D chips, the proposed ring-based interconnection offers several advantages for hierarchical clustering in future many-core systems, in terms of cost, latency, and power reductions. Moreover, based on the proposed SC_Ring, this work realizes a "level-1 non-uniform cache architecture" (L1-NUCA) for fast data communication without cache-coherency in facilitating multithreading/multi-core as a case study. Finally, experimental results show that our approach yields promising performance.