DESC: energy-efficient data exchange using synchronized counters

Authors:
Mahdi Nazm Bojnordi;Engin Ipek
Affiliations:
University of Rochester, Rochester, NY;University of Rochester, Rochester, NY
Venue:
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2013

Citing 20
Cited 0

Decoupled sectored caches: conciliating low tag implementation cost

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Bus-invert coding for low-power I/O

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Low-swing interconnect interface circuits

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Dynamic zero compression for cache energy reduction

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
TLC: Transmission Line Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Interconnect-power dissipation in a microprocessor

Proceedings of the 2004 international workshop on System level interconnect prediction
Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Microarchitectural Wire Management for Performance and Power in Partitioned Architectures

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Exploiting Low Entropy to Reduce Wire Delay

IEEE Computer Architecture Letters
New Generation of Predictive Technology Model for Sub-45nm Design Exploration

ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Energy-Aware Interconnect Optimization for a Coarse Grained Reconfigurable Processor

VLSID '08 Proceedings of the 21st International Conference on VLSI Design
Memory mapped ECC: low-cost error protection for last level caches

Proceedings of the 36th annual international symposium on Computer architecture
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Increasing cache sizes in modern microprocessors require long wires to connect cache arrays to processor cores. As a result, the last-level cache (LLC) has become a major contributor to processor energy, necessitating techniques to increase the energy efficiency of data exchange over LLC interconnects. This paper presents an energy-efficient data exchange mechanism using synchronized counters. The key idea is to represent information by the delay between two consecutive pulses on a set of wires, which makes the number of state transitions on the interconnect independent of the data patterns, and significantly lowers the activity factor. Simulation results show that the proposed technique reduces overall processor energy by 7%, and the L2 cache energy by 1.81× on a set of sixteen parallel applications. This efficiency gain is attained at a cost of less than 1% area overhead to the L2 cache, and a 2% delay overhead to execution time.