Variable latency caches for nanoscale processor

Authors:
Serkan Ozdemir;Arindam Mallik;Ja Chun Ku;Gokhan Memik;Yehea Ismail
Affiliations:
Northwestern University, Evanston, IL;Northwestern University, Evanston, IL;Northwestern University, Evanston, IL;Northwestern University, Evanston, IL;Northwestern University, Evanston, IL
Venue:
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Year:
2007

Citing 18
Cited 2

Balanced scheduling: instruction scheduling when memory latency is uncertain

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Predicting data cache misses in non-numeric applications through correlation profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
On the scheduling of variable latency functional units

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Reducing cross-coupling among interconnect wires in deep-submicron datapath design

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
A case for dynamic pipeline scaling

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21264 Microprocessor

IEEE Micro
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
T3: Trends and Challenges in VLSI Technology Scaling towards 100nm

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Pipeline stage unification: a low-energy consumption technique for future mobile processors

Proceedings of the 2003 international symposium on Low power electronics and design
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Improved clock-gating through transparent pipelining

Proceedings of the 2004 international symposium on Low power electronics and design
Exploring High Bandwidth Pipelined Cache Architecture for Scaled Technology

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Microarchitecture and Design Challenges for Gigascale Integration

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Razor: Circuit-Level Correction of Timing Errors for Low-Power Operation

IEEE Micro
Impact of Parameter Variations on Circuits and Microarchitecture

IEEE Micro
Integrated analysis of power and performance for pipelined microprocessors

IEEE Transactions on Computers

Simulating a LAGS processor to consider variable latency on L1 D-Cache

Proceedings of the 2010 Summer Computer Simulation Conference
Asymmetric-access aware optimization for STT-RAM caches with process variations

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI

Quantified Score

Hi-index	0.00

Visualization

Abstract

Variability is one of the important issues in nanoscale processors. Due to increasing importance of interconnect structures in submicron technologies, the physical location and phenomena such as coupling have an increasing impact on the latency of operations. Therefore, traditional view of rigid access latencies to components wil result in suboptimal architectures. In this paper, we devise a cache architecture with variable access latency. Particularly, we a) develop a non-uniform access level 1 data-cache, b) study the impact of coupling and physical location on level 1 data cache access latencies, and c) develop and study an architecture where the variable latency cache can be accessed while the rest of the pipeline remains synchronous. To find the access latency with different input address transitions and environmental conditions, we first build a SPICE model at a 45nm technology for a cache similar to that of the level 1 data cache of the Intel Prescott architecture. Motivated by the large difference between the worst and best case latencies and the shape of the distribution curve, we change the cache architecture to allow variable latency accesses. Since the latency of the cache is not known at the time of instruction scheduling, we also modify the functional units with the addition of special queues that will temporarily store the dependent instructions and allow the data to be forwarded from the cache to the functional units correctly. Simulations based on SPEC2000 benchmarks show that our variable access latency cache structure can reduce the execution time by as much as 19.4% and 10.7% on average compared to a conventional cache architecture.