Select-free instruction scheduling logic

Authors:
Mary D. Brown;Jared Stark;Yale N. Patt
Affiliations:
The University of Texas, Austin;Intel Corporation;The University of Texas, Austin
Venue:
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Year:
2001

Citing 9
Cited 49

Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
A low-complexity issue logic

Proceedings of the 14th international conference on Supercomputing
Circuits for wide-window superscalar processors

Proceedings of the 27th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Design of High-Performance Microprocessor Circuits

Design of High-Performance Microprocessor Circuits
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Recovery Mechanism for Latency Misprediction

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Superscalar Execution with Direct Data Forwarding

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Energy-efficient hybrid wakeup logic

Proceedings of the 2002 international symposium on Low power electronics and design
A case for dynamic pipeline scaling

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Hierarchical Scheduling Windows

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimizing pipelines for power and performance

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Half-price architecture

Proceedings of the 30th annual international symposium on Computer architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Macro-op Scheduling: Relaxing Scheduling Loop Constraints

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Using Dynamic Binary Translation to Fuse Dependent Instructions

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy

Proceedings of the 31st annual international symposium on Computer architecture
A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
A scalable, clustered SMT processor for digital signal processing

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
An analysis of a resource efficient checkpoint architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Configuration Steering for a Reconfigurable Superscalar Processor

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Instruction packing: reducing power and delay of the dynamic scheduling logic

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Power-Efficient Wakeup Tag Broadcast

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Balancing Resource Utilization to Mitigate Power Density in Processor Pipelines

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Instruction packing: Toward fast and energy-efficient instruction scheduling

ACM Transactions on Architecture and Code Optimization (TACO)
SEED: scalable, efficient enforcement of dependences

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
SlicK: slice-based locality exploitation for efficient redundant multithreading

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A scalable low power issue queue for large instruction window processors

Proceedings of the 20th annual international conference on Supercomputing
Exploiting Operand Availability for Efficient Simultaneous Multithreading

IEEE Transactions on Computers
By-passing the out-of-order execution pipeline to increase energy-efficiency

Proceedings of the 4th international conference on Computing frontiers
Matrix scheduler reloaded

Proceedings of the 34th annual international symposium on Computer architecture
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality

IEEE Transactions on Computers
Building a large instruction window through ROB compression

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Future ILP processors

International Journal of High Performance Computing and Networking
Reducing the impact of intra-core process variability with criticality-based resource allocation and prefetching

Proceedings of the 5th conference on Computing frontiers
Complexity Effective Bypass Networks

Transactions on High-Performance Embedded Architectures and Compilers II
An energy-efficient instruction scheduler design with two-level shelving and adaptive banking

Journal of Computer Science and Technology
An energy-efficient checkpointing mechanism for out of order commit processor

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Design and optimization of the store vectors memory dependence predictor

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing branch misprediction penalties via adaptive pipeline scaling

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Out-of-order issue logic using sorting networks

Proceedings of the 20th symposium on Great lakes symposium on VLSI
Wake-up logic optimizations through selective match and wakeup range limitation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CROB: implementing a large instruction window through compression

Transactions on high-performance embedded architectures and compilers III
FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template

Proceedings of the 38th annual international symposium on Computer architecture
Non-uniform instruction scheduling

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Instruction recirculation: eliminating counting logic in wakeup-free schedulers

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Single FU bypass networks for high clock rate superscalar processors

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Speculative issue logic

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Reducing delay and power consumption of the wakeup logic through instruction packing and tag memoization

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
A scalable, multi-thread, multi-issue array processor architecture for DSP applications based on extended tomasulo scheme

SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Low complexity out-of-order issue logic using static circuits

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Pipelining allows processors to exploit parallelism. Unfortunately, critical loops---pieces of logic that must evaluate in a single cycle to meet IPC (Instructions Per Cycle) goals---prevent deeper pipelining. In today's processors, one of these loops is the instruction scheduling (wakeup and select) logic [10]. This paper describes a technique that pipelines this loop by breaking it into two smaller loops: a critical, single-cycle loop for wakeup; and a non-critical, potentially multi-cycle, loop for select. For the 12 SPECint*2000 benchmarks, a machine with two-cycle select logic (i. e., three-cycle scheduling logic) using this technique has an average IPC 15% greater than a machine with three-cycle pipelined conventional scheduling logic, and an IPC within 3% of a machine of the same pipeline depth and one-cycle (ideal) scheduling logic. Since select accounts for more than half the scheduling latency [10], this technique could significantly increase clock frequency while having minimal impact on IPC.