Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

  • Authors:
  • Dan Ernst;Andrew Hamel;Todd Austin

  • Affiliations:
  • University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI

  • Venue:
  • Proceedings of the 30th annual international symposium on Computer architecture
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

To achieve high instruction throughput, instruction schedulers must be capable of producing high-quality schedules that maximize functional unit utilization while at the same time enabling fast instruction issue logic. Many solutions exist to the scheduling problem, ranging from compile-time to run-time approaches. Compile-time solutions feature fast and simple hardware, but at the expense of conservative schedules. Dynamic schedulers produce high-quality schedules that incorporate run-time information and dependence speculation, but implementing these schedulers requires complex circuits that can slow processor clock speeds.In this paper, we present the Cyclone scheduler, a novel design that captures the benefits of both compileand run-time scheduling. Our approach utilizes a list-based single-pass instruction scheduling algorithm, implemented by hardware at run-time in the front end of the processor pipeline. Once scheduled, instructions are injected into a timed queue that orchestrates their entry into execution. To accommodate branch and load/store dependence speculation, the Cyclone scheduler supports a simple selective replay mechanism. We implement this technique by overloading instruction register forwarding to also detect instructions dependent on incorrectly scheduled operations. Detailed simulation analyses suggest that with sufficient queue width, the Cyclone scheduler can rival the instruction throughput of similarly wide monolithic dynamic schedulers. Furthermore, the circuit complexity of the Cyclone scheduler is much more favorable than a broadcast-based scheduler, as our approach requires no global control signals.