Reusing cached schedules in an out-of-order processor with in-order issue logic

Authors:
Oscar Palomar;Toni Juan;Juan J. Navarro
Affiliations:
Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya and Barcelona Supercomputing Center;Intel® Corporation;Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya.
Venue:
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Year:
2009

Citing 20
Cited 1

Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Exploiting instruction level parallelism in processors by caching scheduled groups

Proceedings of the 24th annual international symposium on Computer architecture
Target prediction for indirect jumps

Proceedings of the 24th annual international symposium on Computer architecture
Power considerations in the design of the Alpha 21264 microprocessor

DAC '98 Proceedings of the 35th annual Design Automation Conference
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Improving quasi-dynamic schedules through region slip

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Scheduling Reusable Instructions for Power Reduction

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Execution cache-based microarchitecture power-efficient superscalar processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Increased Scalability and Power Efficiency by Using Multiple Speed Pipelines

Proceedings of the 32nd annual international symposium on Computer Architecture
Creating Converged Trace Schedules Using String Matching

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Incremental Commit Groups for Non-Atomic Trace Processing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Power-efficient instruction delivery through trace reuse

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Low power microarchitecture with instruction reuse

Proceedings of the 5th conference on Computing frontiers
IBM POWER6 microarchitecture

IBM Journal of Research and Development

Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The complex and powerful out-of-order issue logic dismisses the repetitive nature of the code, unlike what caches or branch predictors do. We show that 90% of the cycles, the group of instructions selected by the issue logic belongs to just 13% of the total different groups issued: the issue logic of an out-of-order processor is constantly re-discovering what it has already found. To benefit from the repetitive nature of instruction issue, we move the scheduling logic after the commit stage, out of the critical path of execution. The schedules created there are cached and reused to feed a simple in-order issue logic, that could result in a higher frequency design. We present the complete design of our ReLaSch processor, that achieves the same average IPC than a conventional out-of-order processor, and a 1.56 speed-up over the IPC of an in-order processor. We actually surpass the out-of-order IPC in 23 out of 40 SPEC benchmarks, mainly because the broader vision of the code after the commit stage allows creating better schedules.