MPS: Miss-Path Scheduling for Multiple-Issue Processors

Authors:
Sanjeev Banerjia;Sumedh W. Sathaye;Kishore N. Menezes;Thomas M. Conte
Affiliations:
Hewelett-Packard Labs, Cambridge, MA;IBM T. J. Watson Research Center, Yorktown Heights, NY;Intel Corp., Santa Clara, CA;North Carolina State Univ., Raleigh
Venue:
IEEE Transactions on Computers
Year:
1998

Citing 18
Cited 3

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Hardware support for large atomic units in dynamically scheduled machines

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Efficient superscalar performance through boosting

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
IBM Power and PowerPC

IBM Power and PowerPC
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A fill-unit approach to multiple instruction issue

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving CISC instruction decoding performance using a fill unit

Proceedings of the 28th annual international symposium on Microarchitecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch mechanisms for VLIW architectures with compressed encodings

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting instruction level parallelism in processors by caching scheduled groups

Proceedings of the 24th annual international symposium on Computer architecture
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Tuning the Pentium Pro Microarchitecture

IEEE Micro
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Expansion Caches For Superscalar Processors

Expansion Caches For Superscalar Processors
Design of a Computer—The Control Data 6600

Design of a Computer—The Control Data 6600

Compiler-Assisted Multiple Instruction Word Retry for VLIW Architectures

IEEE Transactions on Parallel and Distributed Systems
Execution cache-based microarchitecture power-efficient superscalar processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Increased Scalability and Power Efficiency by Using Multiple Speed Pipelines

Proceedings of the 32nd annual international symposium on Computer Architecture

Quantified Score

Hi-index	14.98

Visualization

Abstract

Many contemporary multiple issue processors employ out-of-order scheduling hardware in the processor pipeline. Such scheduling hardware can yield good performance without relying on compile-time scheduling. The hardware can also schedule around unexpected run-time occurrences such as cache misses. As issue widths increase, however, the complexity of such scheduling hardware increases considerably and can have an impact on the cycle time of the processor. This paper presents the design of a multiple issue processor that uses an alternative approach called miss path scheduling or MPS. Scheduling hardware is removed from the processor pipeline altogether and placed on the path between the instruction cache and the next level of memory. Scheduling is performed at cache miss time as instructions are received from memory. Scheduled blocks of instructions are issued to an aggressively clocked in-order execution core. Details of a hardware scheduler that can perform speculation are outlined and shown to be feasible. Performance results from simulations are presented that highlight the effectiveness of an MPS design.