Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Architectural and compiler techniques for energy reduction in high-performance microprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low-power electronics and design
Understanding the energy efficiency of simultaneous multithreading
Proceedings of the 2004 international symposium on Low power electronics and design
HotSpot cache: joint temporal and spatial locality exploitation for i-cache energy reduction
Proceedings of the 2004 international symposium on Low power electronics and design
A highly configurable cache for low energy embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
Fairness and Throughput in Switch on Event Multithreading
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Instruction caches are responsible for a high percentage of the chip energy consumption, becoming a critical issue for battery-powered embedded devices. We can potentially reduce the energy consumption of the first level instruction cache (L1-I) by decreasing its size and associativity. However, demanding applications may suffer a dramatic performance degradation, specially in superscalar multi-threaded processors, where, in each cycle, multiple threads access the L1-I to fetch instructions. We introduce iLP-NUCA (Instruction Light Power NUCA), a new instruction cache that substitutes the conventional L2, improving the Energy-Delay of the system. iLP-NUCA adds a new tree-based transport network topology that reduces latency and energy consumption, regarding former LP-NUCA implementations. With iLP-NUCA we reduce the size of the L1-I outperforming conventional cache hierarchies, and reducing the overall consumption, independently of the number of threads.