LP-NUCA: networks-in-cache for high-performance low-power embedded processors

  • Authors:
  • Darío Suárez Gracia;Giorgos Dimitrakopoulos;Teresa Monreal Arnal;Manolis G. H. Katevenis;Víctor Viñals Yúfera

  • Affiliations:
  • Computer Architecture Group, Departamento de Informática e Ingeniería de Sistemas, Instituto de Investigación en Ingeniería de Aragón, Universidad de Zaragoza, Zaragoza, S ...;Informatics and Communications Engineering Department, University of West Macedonia, Kozani, Greece;Department of Computer Architecture, Universitat Politécnica de Catalunya, Catalunya, Spain and Computer Architecture Group, Universidad de Zaragoza, Zaragoza, Spain;Foundation for Research and Technology-Hellas, Institute of Computer Science, Computer Architecture and VLSI Systems Laboratory, Heraklion, Crete and Department of Computer Science, University of ...;Computer Architecture Group, Departamento de Informática e Ingeniería de Sistemas, Instituto de Investigación en Ingeniería de Aragón, Universidad de Zaragoza, Zaragoza, S ...

  • Venue:
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

High-end embedded processors demand complex on-chip cache hierarchies satisfying several contradicting design requirements such as high-performance operation and low energy consumption. This paper introduces light-power (LP) nonuniform cache architecture (NUCA), a tiled-cache addressing both goals. LP-NUCA places a group of small and low-latency tiles between the L1 and the last level cache (LLC) that adapt better to the application working sets and keep most recently evicted blocks close to L1. LP-NUCA is built around three specialized "networks-in-cache," each aimed at a separate cache operation. To prove the design feasibility, we have fully implemented LP-NUCA in a 90-nm technology. From the VLSI implementation, we observe that the proposed networks-in-cache incur minimal area, latency, and power overhead. To further reduce the energy consumption, LP-NUCA employs two network-wide techniques (miss wave stopping and sectoring) that together reduce the dynamic cache energy by 35% without degrading performance. Our evaluations also show that LP-NUCA improves performance with respect to cache hierarchies similar to those found in high-end embedded processors. Similar results have been obtained after scaling to a 32-nm technology.