TLC: Transmission Line Caches

Authors:
Bradford M. Beckmann;David A. Wood
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison;Computer Sciences Department, University of Wisconsin-Madison
Venue:
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Year:
2003

Citing 20
Cited 27

Principles of CMOS VLSI design: a systems perspective

Principles of CMOS VLSI design: a systems perspective
Inexpensive implementations of set-associativity

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Generating representative Web workloads for network and server performance evaluation

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Digital systems engineering

Digital systems engineering
Getting to the bottom of deep submicron II: a global wiring paradigm

ISPD '99 Proceedings of the 1999 international symposium on Physical design
A novel VLSI layout fabric for deep sub-micron applications

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Full-system timing-first simulation

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Simics: A Full System Simulation Platform

Computer
The Alpha 21264 Microprocessor

IEEE Micro
Simulating a $2M Commercial Server on a $2K PC

Computer
Orion: a power-performance simulator for interconnection networks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Itanium 2 Processor Microarchitecture

IEEE Micro
Semiconductors: fast films

IEEE Spectrum
POWER4 system microarchitecture

IBM Journal of Research and Development
The circuit and physical design of the POWER4 microprocessor

IBM Journal of Research and Development

Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Package level interconnect options

Proceedings of the 2005 international workshop on System level interconnect prediction
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
Surfliner: A Distortionless Electrical Signaling Scheme for Speed of Light On-Chip Communications

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Constant impedance scaling paradigm for interconnect synthesis

Proceedings of the 2006 international workshop on System-level interconnect prediction
Constant Impedance Scaling Paradigm for Scaling LC transmission lines

ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
Analysis and modeling of power grid transmission lines

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Analysis and modeling of power grid transmission lines

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Achieving structural and composable modeling of complex systems

International Journal of Parallel Programming - Special issue: The next generation software program
Heterogeneous way-size cache

Proceedings of the 20th annual international conference on Supercomputing
Interconnect design considerations for large NUCA caches

Proceedings of the 34th annual international symposium on Computer architecture
Power reduction of CMP communication networks via RF-interconnects

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Efficient message management in tiled CMP architectures using a heterogeneous interconnection network

HiPC'07 Proceedings of the 14th international conference on High performance computing
The SKB: a semi-completely-connected bus for on-chip systems

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Exploiting address compression and heterogeneous interconnects for efficient message management in tiled CMPs

Journal of Systems Architecture: the EUROMICRO Journal
Light NUCA: a proposal for bridging the inter-cache latency gap

Proceedings of the Conference on Design, Automation and Test in Europe
Enhancing L2 organization for CMPs with a center cell

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hierarchical circuit-switched NoC for multicore video processing

Microprocessors & Microsystems
Wafer-level package interconnect options

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
TLSync: support for multiple fast barriers using on-chip transmission lines

Proceedings of the 38th annual international symposium on Computer architecture
A case for globally shared-medium on-chip interconnect

Proceedings of the 38th annual international symposium on Computer architecture
A design space exploration of transmission-line links for on-chip interconnect

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Enhancing effective throughput for transmission line-based bus

Proceedings of the 39th Annual International Symposium on Computer Architecture
Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnect

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
DESC: energy-efficient data exchange using synchronized counters

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is widely accepted that the disproportionate scalingof transistor and conventional on-chip interconnect performancepresents a major barrier to future high performancesystems. Previous research has focused on wire-centricdesigns that use parallelism, locality, and on-chipwiring bandwidth to compensate for long wire latency.An alternative approach to this problem is to exploitnewly-emerging on-chip transmission line technology toreduce communication latency. Compared to conventionalRC wires, transmission lines can reduce delay by up to afactor of 30 for global wires, while eliminating the needfor repeaters. However, this latency reduction comes at thecost of a comparable reduction in bandwidth.In this paper, we investigate using transmission linesto access large level-2 on-chip caches. We propose a familyof Transmission Line Cache (TLC) designs that representdifferent points in the latency/bandwidth spectrum.Compared to the recently-proposed Dynamic Non-UniformCache Architecture (DNUCA) design, the base TLCdesign reduces the required cache area by 18% andreduces the interconnection network's dynamic powerconsumption by an average of 61%. The optimized TLCdesigns attain similar performance using fewer transmis-sionlines but with some additional complexity. Simulationresults using full-system simulation show that TLC providesmore consistent performance than the DNUCAdesign across a wide variety of workloads. TLC caches arelogically simpler than DNUCA designs, but requiregreater circuit and manufacturing complexity.