Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Proceedings of the 14th international conference on Supercomputing
Circuits for wide-window superscalar processors
Proceedings of the 27th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Design of High-Performance Microprocessor Circuits
Design of High-Performance Microprocessor Circuits
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Recovery Mechanism for Latency Misprediction
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Superscalar Execution with Direct Data Forwarding
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Energy-efficient hybrid wakeup logic
Proceedings of the 2002 international symposium on Low power electronics and design
A case for dynamic pipeline scaling
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Hierarchical Scheduling Windows
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay
Proceedings of the 30th annual international symposium on Computer architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Macro-op Scheduling: Relaxing Scheduling Loop Constraints
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Using Dynamic Binary Translation to Fuse Dependent Instructions
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Proceedings of the 31st annual international symposium on Computer architecture
A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
A scalable, clustered SMT processor for digital signal processing
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
An analysis of a resource efficient checkpoint architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Configuration Steering for a Reconfigurable Superscalar Processor
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Instruction packing: reducing power and delay of the dynamic scheduling logic
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Power-Efficient Wakeup Tag Broadcast
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Balancing Resource Utilization to Mitigate Power Density in Processor Pipelines
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Instruction packing: Toward fast and energy-efficient instruction scheduling
ACM Transactions on Architecture and Code Optimization (TACO)
SEED: scalable, efficient enforcement of dependences
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
SlicK: slice-based locality exploitation for efficient redundant multithreading
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A scalable low power issue queue for large instruction window processors
Proceedings of the 20th annual international conference on Supercomputing
Exploiting Operand Availability for Efficient Simultaneous Multithreading
IEEE Transactions on Computers
By-passing the out-of-order execution pipeline to increase energy-efficiency
Proceedings of the 4th international conference on Computing frontiers
Proceedings of the 34th annual international symposium on Computer architecture
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality
IEEE Transactions on Computers
Building a large instruction window through ROB compression
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
International Journal of High Performance Computing and Networking
Proceedings of the 5th conference on Computing frontiers
Complexity Effective Bypass Networks
Transactions on High-Performance Embedded Architectures and Compilers II
An energy-efficient instruction scheduler design with two-level shelving and adaptive banking
Journal of Computer Science and Technology
An energy-efficient checkpointing mechanism for out of order commit processor
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Design and optimization of the store vectors memory dependence predictor
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing branch misprediction penalties via adaptive pipeline scaling
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Out-of-order issue logic using sorting networks
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Wake-up logic optimizations through selective match and wakeup range limitation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CROB: implementing a large instruction window through compression
Transactions on high-performance embedded architectures and compilers III
Proceedings of the 38th annual international symposium on Computer architecture
Non-uniform instruction scheduling
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Instruction recirculation: eliminating counting logic in wakeup-free schedulers
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Single FU bypass networks for high clock rate superscalar processors
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Low complexity out-of-order issue logic using static circuits
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.01 |
Pipelining allows processors to exploit parallelism. Unfortunately, critical loops---pieces of logic that must evaluate in a single cycle to meet IPC (Instructions Per Cycle) goals---prevent deeper pipelining. In today's processors, one of these loops is the instruction scheduling (wakeup and select) logic [10]. This paper describes a technique that pipelines this loop by breaking it into two smaller loops: a critical, single-cycle loop for wakeup; and a non-critical, potentially multi-cycle, loop for select. For the 12 SPECint*2000 benchmarks, a machine with two-cycle select logic (i. e., three-cycle scheduling logic) using this technique has an average IPC 15% greater than a machine with three-cycle pipelined conventional scheduling logic, and an IPC within 3% of a machine of the same pipeline depth and one-cycle (ideal) scheduling logic. Since select accounts for more than half the scheduling latency [10], this technique could significantly increase clock frequency while having minimal impact on IPC.