Matrix scheduler reloaded

Authors:
Peter G. Sassone;Jeff Rupley, II;Edward Brekelbaum;Gabriel H. Loh;Bryan Black
Affiliations:
Intel Microarchitecture Research Lab (MRL), Austin, TX;Intel Microarchitecture Research Lab (MRL), Austin, TX;Intel Microarchitecture Research Lab (MRL), Austin, TX;Georgia Inst of Technology, Atlanta, GA;Intel Microarchitecture Research Lab (MRL), Austin, TX
Venue:
Proceedings of the 34th annual international symposium on Computer architecture
Year:
2007

Citing 24
Cited 7

An investigation of the performance of various dynamic scheduling techniques

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Slack: maximizing performance under technological constraints

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A high-speed dynamic instruction scheduling scheme for superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Hierarchical Scheduling Windows

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264: A 500 MHz Out-of-Order Execution Microprocessor

COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
Virtual-Physical Registers

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Half-price architecture

Proceedings of the 30th annual international symposium on Computer architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Dynamic Prediction of Critical Path Instructions

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures

IEEE Transactions on Parallel and Distributed Systems
Static strands: safely collapsing dependence chains for increasing embedded power efficiency

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging

A modular 3d processor for flexible product design and technology migration

Proceedings of the 5th conference on Computing frontiers
Federation: repurposing scalar cores for out-of-order instruction issue

Proceedings of the 45th annual Design Automation Conference
Design and optimization of the store vectors memory dependence predictor

ACM Transactions on Architecture and Code Optimization (TACO)
A physical-level study of the compacted matrix instruction scheduler for dynamically-scheduled superscalar processors

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Forwardflow: a scalable core for power-constrained CMPs

Proceedings of the 37th annual international symposium on Computer architecture
Federation: Boosting per-thread performance of throughput-oriented manycore architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Low complexity out-of-order issue logic using static circuits

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

From multiprocessor scale-up to cache sizes to the number of reorder-buffer entries, microarchitects wish to reap the benefits of more computing resources while staying within power and latency bounds. This tension is quite evident in schedulers, which need to be large and single-cycle for maximum performance on out-of-order cores. In this work we present two straightforward modifications to a matrix scheduler implementation which greatly strengthen its scalability. Both are based on the simple observation that the wakeup and picker matrices are sparse, even at small sizes; thus small indirection tables can be used to greatly reduce their width and latency. This technique can be used to create quicker iso-performance schedulers (17-58% reduced critical path) or larger iso-timing schedulers (7-26% IPC increase). Importantly, the power and area requirements of the additional hardware are likely offset by the greatly reduced matrix sizes and subsuming the functionality of the power-hungry allocation CAMs.