Shrinking l1 instruction caches to improve energy: delay in SMT embedded processors

Authors:
Alexandra Ferrerón-Labari;Marta Ortín-Obón;Darío Suárez-Gracia;Jesús Alastruey-Benedé;Víctor Viñals-Yúfera
Affiliations:
gaZ--DIIS--I3A, Universidad de Zaragoza, Spain;gaZ--DIIS--I3A, Universidad de Zaragoza, Spain;gaZ--DIIS--I3A, Universidad de Zaragoza, Spain;gaZ--DIIS--I3A, Universidad de Zaragoza, Spain;gaZ--DIIS--I3A, Universidad de Zaragoza, Spain
Venue:
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Year:
2013

Citing 10
Cited 0

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Architectural and compiler techniques for energy reduction in high-performance microprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low-power electronics and design
Understanding the energy efficiency of simultaneous multithreading

Proceedings of the 2004 international symposium on Low power electronics and design
HotSpot cache: joint temporal and spatial locality exploitation for i-cache energy reduction

Proceedings of the 2004 international symposium on Low power electronics and design
A highly configurable cache for low energy embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Fairness and Throughput in Switch on Event Multithreading

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instruction caches are responsible for a high percentage of the chip energy consumption, becoming a critical issue for battery-powered embedded devices. We can potentially reduce the energy consumption of the first level instruction cache (L1-I) by decreasing its size and associativity. However, demanding applications may suffer a dramatic performance degradation, specially in superscalar multi-threaded processors, where, in each cycle, multiple threads access the L1-I to fetch instructions. We introduce iLP-NUCA (Instruction Light Power NUCA), a new instruction cache that substitutes the conventional L2, improving the Energy-Delay of the system. iLP-NUCA adds a new tree-based transport network topology that reduces latency and energy consumption, regarding former LP-NUCA implementations. With iLP-NUCA we reduce the size of the L1-I outperforming conventional cache hierarchies, and reducing the overall consumption, independently of the number of threads.