Variable latency caches for nanoscale processor

  • Authors:
  • Serkan Ozdemir;Arindam Mallik;Ja Chun Ku;Gokhan Memik;Yehea Ismail

  • Affiliations:
  • Northwestern University, Evanston, IL;Northwestern University, Evanston, IL;Northwestern University, Evanston, IL;Northwestern University, Evanston, IL;Northwestern University, Evanston, IL

  • Venue:
  • Proceedings of the 2007 ACM/IEEE conference on Supercomputing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Variability is one of the important issues in nanoscale processors. Due to increasing importance of interconnect structures in submicron technologies, the physical location and phenomena such as coupling have an increasing impact on the latency of operations. Therefore, traditional view of rigid access latencies to components wil result in suboptimal architectures. In this paper, we devise a cache architecture with variable access latency. Particularly, we a) develop a non-uniform access level 1 data-cache, b) study the impact of coupling and physical location on level 1 data cache access latencies, and c) develop and study an architecture where the variable latency cache can be accessed while the rest of the pipeline remains synchronous. To find the access latency with different input address transitions and environmental conditions, we first build a SPICE model at a 45nm technology for a cache similar to that of the level 1 data cache of the Intel Prescott architecture. Motivated by the large difference between the worst and best case latencies and the shape of the distribution curve, we change the cache architecture to allow variable latency accesses. Since the latency of the cache is not known at the time of instruction scheduling, we also modify the functional units with the addition of special queues that will temporarily store the dependent instructions and allow the data to be forwarded from the cache to the functional units correctly. Simulations based on SPEC2000 benchmarks show that our variable access latency cache structure can reduce the execution time by as much as 19.4% and 10.7% on average compared to a conventional cache architecture.