An Architectural Framework for Runtime Optimization

Authors:
Matthew C. Merten;Andrew R. Trick;Ronald D. Barnes
Affiliations:
Univ. of Illinois, Urbana;Univ. of Illinois, Urbana;Univ. of Illinois, Urbana
Venue:
IEEE Transactions on Computers
Year:
2001

Citing 2
Cited 24

Evaluation of Design Options for the Trace Cache Fetch Mechanism

IEEE Transactions on Computers - Special issue on cache memory and related problems
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture

Managing multi-configuration hardware via dynamic working set analysis

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Vacuum packing: extracting hardware-detected program phases for post-link optimization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic binary translation for accumulator-oriented architectures

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Retargetable and reconfigurable software dynamic translation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Phase tracking and prediction

Proceedings of the 30th annual international symposium on Computer architecture
Exploiting program hotspots and code sequentiality for instruction cache leakage management

Proceedings of the 2003 international symposium on Low power electronics and design
RABIT: A New Framework for Runtime Emulation and Binary Translation

ANSS '04 Proceedings of the 37th annual symposium on Simulation
Method-level phase behavior in java workloads

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Effective Adaptive Computing Environment Management via Dynamic Optimization

Proceedings of the international symposium on Code generation and optimization
Continuous Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
Frequent Loop Detection Using Efficient Nonintrusive On-Chip Hardware

IEEE Transactions on Computers
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining

IEEE Transactions on Computers
Selecting Software Phase Markers with Code Structure Analysis

Proceedings of the International Symposium on Code Generation and Optimization
Reducing Startup Time in Co-Designed Virtual Machines

Proceedings of the 33rd annual international symposium on Computer Architecture
Effective management of multiple configurable units using dynamic optimization

ACM Transactions on Architecture and Code Optimization (TACO)
A smart random code injection to mask power analysis based side channel attacks

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Using hpm-sampling to drive dynamic compilation

Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
PEAK—a fast and effective performance tuning system via compiler optimization orchestration

ACM Transactions on Programming Languages and Systems (TOPLAS)
A low-power phase change memory based hybrid cache architecture

Proceedings of the 18th ACM Great Lakes symposium on VLSI
Direct address translation for virtual memory in energy-efficient embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Cross-layer customization for rapid and low-cost task preemption in multitasked embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Detecting phases in parallel applications on shared memory architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Trace-Based runtime instruction rescheduling for architecture extension

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Randomized Instruction Injection to Counter Power Analysis Attacks

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	14.99

Visualization

Abstract

Wide-issue processors continue to achieve higher performance by exploiting greater instruction-level parallelism. Dynamic techniques such as out-of-order execution and hardware speculation have proven effective at increasing instruction throughput. Run-time optimization promises to provide an even higher level of performance by adaptively applying aggressive code transformations on a larger scope. This paper presents a new hardware mechanism for generating and deploying runtime optimized code. The mechanism can be viewed as a filtering system that resides in the retirement stage of the processor pipeline, accepts an instruction execution stream as input, and produces instruction profiles and sets of linked, optimized traces as output. The code deployment mechanism uses an extension to the branch prediction mechanism to migrate execution into the new code without modifying the original code. These new components do not add delay to the execution of the program except during short bursts of reoptimization. This technique provides a strong platform for runtime optimization because the hot execution regions are extracted, optimized, and written to main memory for execution and because these regions persist across context switches. The current design of the framework supports a suite of optimizations, including partial function inlining (even into shared libraries), code straightening optimizations, loop unrolling, and peephole optimizations.