Performance evaluation of a decoded instruction cache for variable instruction-length computers
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Alternative fetch and issue policies for the trace cache fetch mechanism
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reducing power in high-performance microprocessors
DAC '98 Proceedings of the 35th annual Design Automation Conference
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Architectural and compiler techniques for energy reduction in high-performance microprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low-power electronics and design
Using dynamic cache management techniques to reduce energy in general purpose processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on system-level interconnect prediction
Tuning the Pentium Pro Microarchitecture
IEEE Micro
Variable-sized object packing and its applications to instruction cache design
Computers and Electrical Engineering
DLIC: Decoded loop instructions caching for energy-aware embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Modern computer architectures that support variable length instruction set architectures (ISA), such as the Intel's IA-32, distinguish between the architectural level of presentation and the micro-architectural representations of the instructions. At the micro-architectural level, instructions are represented by fixed-length micro-operations termed uops, and complex instructions are broken into sequence of uops. The fetch and decode operations in such architectures are extremely complicated and power hungry, especially if they aim to handle several variable length instructions per cycle. This paper suggests caching uop sequences from decoded instructions in a special structure, termed uop cache (UC), and use this fix-length decoded format when possible. Doing so enables reduction in the processor's power and energy consumption while not compromising performance. We will show that a moderately-sized UC can eliminate about 75% instruction decodes across a broad range of benchmarks and over 90% in multimedia applications and high-power tests. For existing Intel P6 family processors, the eliminated work may save about 10% of the full-chip power consumption. While the new proposed technique can be used to save power without degrading performance, we can also use it to improve processor performance when power is constrained.