Hierarchical Scheduling Windows

Authors:
Edward Brekelbaum;Jeff Rupley;Chris Wilkerson;Bryan Black
Affiliations:
Intel Labs (formerly known as MRL);Intel Labs (formerly known as MRL);Intel Labs (formerly known as MRL);Intel Labs (formerly known as MRL)
Venue:
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Year:
2002

Citing 14
Cited 31

Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Dynamic dependency analysis of ordinary programs

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The performance impact of incomplete bypassing in processor pipelines

Proceedings of the 28th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Slack: maximizing performance under technological constraints

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor

IEEE Micro
Recovery Mechanism for Latency Misprediction

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Dynamic Branch Prediction with Perceptrons

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Fast Path-Based Neural Branch Prediction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Late Allocation and Early Release of Physical Registers

IEEE Transactions on Computers
A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
A scalable, clustered SMT processor for digital signal processing

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Decoupled Software Pipelining with the Synchronization Array

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Toward kilo-instruction processors

ACM Transactions on Architecture and Code Optimization (TACO)
Improved latency and accuracy for neural branch prediction

ACM Transactions on Computer Systems (TOCS)
Piecewise Linear Branch Prediction

Proceedings of the 32nd annual international symposium on Computer Architecture
Instruction packing: reducing power and delay of the dynamic scheduling logic

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Power-Efficient Wakeup Tag Broadcast

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Instruction packing: Toward fast and energy-efficient instruction scheduling

ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting Operand Availability for Efficient Simultaneous Multithreading

IEEE Transactions on Computers
Unified microprocessor core storage

Proceedings of the 4th international conference on Computing frontiers
Matrix scheduler reloaded

Proceedings of the 34th annual international symposium on Computer architecture
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality

IEEE Transactions on Computers
Future ILP processors

International Journal of High Performance Computing and Networking
A distributed, simultaneously multi-threaded (SMT) processor with clustered scheduling windows for scalable DSP performance

Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Streamlining long latency instructions for seamlessly combined out-of-order and in-order execution

Microprocessors & Microsystems
A low-complexity microprocessor design with speculative pre-execution

Journal of Systems Architecture: the EUROMICRO Journal
Generalizing neural branch prediction

ACM Transactions on Architecture and Code Optimization (TACO)
A complexity-effective microprocessor design with decoupled dispatch queues and prefetching

Parallel Computing
Design and optimization of the store vectors memory dependence predictor

ACM Transactions on Architecture and Code Optimization (TACO)
Federation: Boosting per-thread performance of throughput-oriented manycore architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient mechanisms for managing thread context in throughput processors

Proceedings of the 38th annual international symposium on Computer architecture
Non-uniform instruction scheduling

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Instruction recirculation: eliminating counting logic in wakeup-free schedulers

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Reducing delay and power consumption of the wakeup logic through instruction packing and tag memoization

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors

ACM Transactions on Computer Systems (TOCS)
MLP-aware dynamic instruction window resizing for adaptively exploiting both ILP and MLP

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

Large scheduling windows are an effective mechanism for increasing microprocessor performance through the extraction of instruction level parallelism. Current techniques do not scale effectively for very large windows, leading to slow wakeup and select logic as well as large complicated bypass networks. This paper introduces a new instruction scheduler implementation, referred to as Hierarchical Scheduling Windows or HSW, which exploits latency tolerant instructions in order to reduce implementation complexity. HSW yields a very large instruction window that tolerates wakeup, select, and bypass latency, while extracting significant far-flung ILP.Results: It is shown that HSW loses