Software support for speculative loads
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Efficient superscalar performance through boosting
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Digital design: principles and practices (2nd ed.)
Digital design: principles and practices (2nd ed.)
Sentinel scheduling: a model for compiler-controlled speculative execution
ACM Transactions on Computer Systems (TOCS)
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Advanced compiler design and implementation
Advanced compiler design and implementation
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Efficient dynamic scheduling through tag elimination
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A high-speed dynamic instruction scheduling scheme for superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Counterflow Pipeline Processor Architecture
IEEE Design & Test
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies
IEEE Transactions on Computers
Recovery Mechanism for Latency Misprediction
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Non-Stalling CounterFlow Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Predicate prediction for efficient out-of-order execution
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Scaling the issue window with look-ahead latency prediction
Proceedings of the 18th annual international conference on Supercomputing
An efficient wakeup design for energy reduction in high-performance superscalar processors
Proceedings of the 2nd conference on Computing frontiers
Instruction packing: reducing power and delay of the dynamic scheduling logic
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Tornado warning: the perils of selective replay in multithreaded processors
Proceedings of the 19th annual international conference on Supercomputing
Low-power, low-complexity instruction issue using compiler assistance
Proceedings of the 19th annual international conference on Supercomputing
Reducing the Energy of Speculative Instruction Schedulers
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Power-Efficient Wakeup Tag Broadcast
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Instruction packing: Toward fast and energy-efficient instruction scheduling
ACM Transactions on Architecture and Code Optimization (TACO)
SEED: scalable, efficient enforcement of dependences
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A scalable low power issue queue for large instruction window processors
Proceedings of the 20th annual international conference on Supercomputing
Exploiting Operand Availability for Efficient Simultaneous Multithreading
IEEE Transactions on Computers
Proceedings of the 34th annual international symposium on Computer architecture
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality
IEEE Transactions on Computers
Communications of the ACM - Web science
A low-complexity microprocessor design with speculative pre-execution
Journal of Systems Architecture: the EUROMICRO Journal
Accurate Instruction Pre-scheduling in Dynamically Scheduled Processors
Transactions on High-Performance Embedded Architectures and Compilers II
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Gossiping for threshold detection
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
Reusing cached schedules in an out-of-order processor with in-order issue logic
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Wake-up logic optimizations through selective match and wakeup range limitation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy-efficient mechanisms for managing thread context in throughput processors
Proceedings of the 38th annual international symposium on Computer architecture
Non-uniform instruction scheduling
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Instruction recirculation: eliminating counting logic in wakeup-free schedulers
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
ACM Transactions on Computer Systems (TOCS)
Compiler directed issue queue energy reduction
Transactions on High-Performance Embedded Architectures and Compilers IV
Hi-index | 0.01 |
To achieve high instruction throughput, instruction schedulers must be capable of producing high-quality schedules that maximize functional unit utilization while at the same time enabling fast instruction issue logic. Many solutions exist to the scheduling problem, ranging from compile-time to run-time approaches. Compile-time solutions feature fast and simple hardware, but at the expense of conservative schedules. Dynamic schedulers produce high-quality schedules that incorporate run-time information and dependence speculation, but implementing these schedulers requires complex circuits that can slow processor clock speeds.In this paper, we present the Cyclone scheduler, a novel design that captures the benefits of both compileand run-time scheduling. Our approach utilizes a list-based single-pass instruction scheduling algorithm, implemented by hardware at run-time in the front end of the processor pipeline. Once scheduled, instructions are injected into a timed queue that orchestrates their entry into execution. To accommodate branch and load/store dependence speculation, the Cyclone scheduler supports a simple selective replay mechanism. We implement this technique by overloading instruction register forwarding to also detect instructions dependent on incorrectly scheduled operations. Detailed simulation analyses suggest that with sufficient queue width, the Cyclone scheduler can rival the instruction throughput of similarly wide monolithic dynamic schedulers. Furthermore, the circuit complexity of the Cyclone scheduler is much more favorable than a broadcast-based scheduler, as our approach requires no global control signals.