Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture

Authors:
Yoav Almog;Roni Rosner;Naftali Schwartz;Ari Schmorak
Affiliations:
-;-;-;-
Venue:
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Year:
2004

Citing 25
Cited 7

Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhancing instruction scheduling with a block-structured ISA

International Journal of Parallel Programming
Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting instruction level parallelism in processors by caching scheduled groups

Proceedings of the 24th annual international symposium on Computer architecture
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Path-based next trace prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving trace cache effectiveness with branch promotion and trace packing

Proceedings of the 25th annual international symposium on Computer architecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A Trace Cache Microarchitecture and Evaluation

IEEE Transactions on Computers - Special issue on cache memory and related problems
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Software trace cache

ICS '99 Proceedings of the 13th international conference on Supercomputing
Trace preconstruction

Proceedings of the 27th annual international symposium on Computer architecture
Increasing the size of atomic instruction blocks using control flow assertions

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Dynamic and Transparent Binary Translation

Computer
Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors

IEEE Micro
Filtering Techniques to Improve Trace-Cache Efficiency

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Optimizing pipelines for power and performance

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Selecting long atomic traces for high coverage

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Dynamic Optimization of Micro-Operations

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
VLIW Scheduling for Energy and Performance

WVLSI '01 Proceedings of the IEEE Computer Society Workshop on VLSI 2001
IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium®-based systems

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
PARROT: power awareness through selective dynamically optimized traces

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems

Power Awareness through Selective Dynamically Optimized Traces

Proceedings of the 31st annual international symposium on Computer architecture
Continuous Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
Predictor virtualization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
TAO: two-level atomicity for dynamic binary optimizations

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Federation: Boosting per-thread performance of throughput-oriented manycore architectures

ACM Transactions on Architecture and Code Optimization (TACO)
PARROT: power awareness through selective dynamically optimized traces

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
LAR-CC: Large atomic regions with conditional commits

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study several major characteristics of dynamic optimizationwithin the PARROT power-aware, trace-cache-basedmicroarchitectural framework. We investigate thebenefit of providing optimizations which although tightlycoupled with the microarchitecture in substance are decoupledin time.The tight coupling in substance provides the potentialfor tailoring optimizations for microarchitecture in amanner impossible or impractical not only for traditionalstatic compilers but even for a JIT. We show that the contributionof common, generic optimizations to processorperformance and energy efficiency may be more thandoubled by creating a more intimate correlation betweenhardware specifics and the optimizer. In particular, dynamicoptimizations can profit greatly from hardwaresupporting fused and SIMDified operations.At the same time, the decoupling in time allows optimizationsto be arbitrarily aggressive without significantperformance loss. We demonstrate that requiring up to512 repetitions before a trace is optimized sacrifices almostno performance or efficiency as compared withlower thresholds. These results confirm the feasibility ofenergy efficient hardware implementation of an aggressiveoptimizer.