Principles of CMOS VLSI design: a systems perspective
Principles of CMOS VLSI design: a systems perspective
Inexpensive implementations of set-associativity
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Generating representative Web workloads for network and server performance evaluation
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Digital systems engineering
Getting to the bottom of deep submicron II: a global wiring paradigm
ISPD '99 Proceedings of the 1999 international symposium on Physical design
A novel VLSI layout fabric for deep sub-micron applications
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Full-system timing-first simulation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21264 Microprocessor
IEEE Micro
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Itanium 2 Processor Microarchitecture
IEEE Micro
IEEE Spectrum
POWER4 system microarchitecture
IBM Journal of Research and Development
The circuit and physical design of the POWER4 microprocessor
IBM Journal of Research and Development
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Package level interconnect options
Proceedings of the 2005 international workshop on System level interconnect prediction
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
Surfliner: A Distortionless Electrical Signaling Scheme for Speed of Light On-Chip Communications
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Constant impedance scaling paradigm for interconnect synthesis
Proceedings of the 2006 international workshop on System-level interconnect prediction
Constant Impedance Scaling Paradigm for Scaling LC transmission lines
ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
Analysis and modeling of power grid transmission lines
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Analysis and modeling of power grid transmission lines
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Achieving structural and composable modeling of complex systems
International Journal of Parallel Programming - Special issue: The next generation software program
Proceedings of the 20th annual international conference on Supercomputing
Interconnect design considerations for large NUCA caches
Proceedings of the 34th annual international symposium on Computer architecture
Power reduction of CMP communication networks via RF-interconnects
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
HiPC'07 Proceedings of the 14th international conference on High performance computing
The SKB: a semi-completely-connected bus for on-chip systems
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Journal of Systems Architecture: the EUROMICRO Journal
Light NUCA: a proposal for bridging the inter-cache latency gap
Proceedings of the Conference on Design, Automation and Test in Europe
Enhancing L2 organization for CMPs with a center cell
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hierarchical circuit-switched NoC for multicore video processing
Microprocessors & Microsystems
Wafer-level package interconnect options
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
TLSync: support for multiple fast barriers using on-chip transmission lines
Proceedings of the 38th annual international symposium on Computer architecture
A case for globally shared-medium on-chip interconnect
Proceedings of the 38th annual international symposium on Computer architecture
A design space exploration of transmission-line links for on-chip interconnect
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Enhancing effective throughput for transmission line-based bus
Proceedings of the 39th Annual International Symposium on Computer Architecture
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
DESC: energy-efficient data exchange using synchronized counters
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
It is widely accepted that the disproportionate scalingof transistor and conventional on-chip interconnect performancepresents a major barrier to future high performancesystems. Previous research has focused on wire-centricdesigns that use parallelism, locality, and on-chipwiring bandwidth to compensate for long wire latency.An alternative approach to this problem is to exploitnewly-emerging on-chip transmission line technology toreduce communication latency. Compared to conventionalRC wires, transmission lines can reduce delay by up to afactor of 30 for global wires, while eliminating the needfor repeaters. However, this latency reduction comes at thecost of a comparable reduction in bandwidth.In this paper, we investigate using transmission linesto access large level-2 on-chip caches. We propose a familyof Transmission Line Cache (TLC) designs that representdifferent points in the latency/bandwidth spectrum.Compared to the recently-proposed Dynamic Non-UniformCache Architecture (DNUCA) design, the base TLCdesign reduces the required cache area by 18% andreduces the interconnection network's dynamic powerconsumption by an average of 61%. The optimized TLCdesigns attain similar performance using fewer transmis-sionlines but with some additional complexity. Simulationresults using full-system simulation show that TLC providesmore consistent performance than the DNUCAdesign across a wide variety of workloads. TLC caches arelogically simpler than DNUCA designs, but requiregreater circuit and manufacturing complexity.