PARROT: power awareness through selective dynamically optimized traces

Authors:
Roni Rosner;Yoav Almog;Micha Moffie;Naftali Schwartz;Avi Mendelson
Affiliations:
Microprocessor Research, Intel Labs, Haifa, Israel;Microprocessor Research, Intel Labs, Haifa, Israel;Microprocessor Research, Intel Labs, Haifa, Israel;Microprocessor Research, Intel Labs, Haifa, Israel;Microprocessor Research, Intel Labs, Haifa, Israel
Venue:
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Year:
2003

Citing 23
Cited 3

Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhancing instruction scheduling with a block-structured ISA

International Journal of Parallel Programming
Exploiting instruction level parallelism in processors by caching scheduled groups

Proceedings of the 24th annual international symposium on Computer architecture
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Path-based next trace prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A Trace Cache Microarchitecture and Evaluation

IEEE Transactions on Computers - Special issue on cache memory and related problems
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A hardware mechanism for dynamic extraction and relayout of program hot spots

Proceedings of the 27th annual international symposium on Computer architecture
Increasing the size of atomic instruction blocks using control flow assertions

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Micro-operation cache: a power aware frontend for the variable instruction length ISA

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Dynamic and Transparent Binary Translation

Computer
Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors

IEEE Micro
Filtering Techniques to Improve Trace-Cache Efficiency

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Optimizing pipelines for power and performance

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Selecting long atomic traces for high coverage

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Dynamic Optimization of Micro-Operations

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Performance and Hardware Complexity Tradeoffs in Designing Multithreaded Architectures

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
VLIW Scheduling for Energy and Performance

WVLSI '01 Proceedings of the IEEE Computer Society Workshop on VLSI 2001
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Trace Cache Sampling Filter

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computing

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the PARROT concept aimed at both higher performance and power-awareness. The PARROT microarchitectural framework integrates trace caching, dynamic optimizations and pipeline decoupling. We employ a gradual and selective approach for applying complex mechanisms only for the most frequently used traces to maximize the performance gain at any given power constraint, thus attaining finer control of tradeoffs between performance and power awareness. We show that the PARROT microarchitecture delivers performance increases comparable to those available through conventional doubling of execution resources (average 16% IPC improvement). This improvement comes through better utilization of all available resources with the combination of a trace cache and selective trace optimization. On the other hand, performance advantage of a trace cache alone is limited to wide-machine configurations. No less critical, however, is power awareness. The PARROT microarchitecture delivers the performance increase at a comparable energy level, whereas the conventional path to higher performance consumes an average 70% more energy. Meanwhile, for those designs which can tolerate a higher power budget, PARROT gracefully scales up to use additional execution resources in a uniformly efficient manner. In particular, a PARROT-style doubly-wide machine delivers an average 45% IPC improvement while actually improving the Cubic- MIPS-per-WATT power awareness metric by over 50%.