Boosting beyond static scheduling in a superscalar processor

Authors:
Michael D. Smith;Monica S. Lam;Mark A. Horowitz
Affiliations:
Computer Systems Laboratory, Stanford University, Stanford, CA;Computer Systems Laboratory, Stanford University, Stanford, CA;Computer Systems Laboratory, Stanford University, Stanford, CA
Venue:
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Year:
1990

Citing 14
Cited 45

Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Intel's secret is out

IEEE Spectrum
Architecture and compiler tradeoffs for a long instruction wordprocessor

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Dynamic Instruction Scheduling and the Astronautics ZS-1

Computer
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Look-Ahead Processors

ACM Computing Surveys (CSUR)
Super-Scalar Processor Design

Super-Scalar Processor Design
Planning a computer system: Project Stretch

Planning a computer system: Project Stretch

High-bandwidth data memory systems for superscalar processors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Global instruction scheduling for superscalar machines

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
OHMEGA: a VLSI superscalar processor architecture for numerical applications

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Exploiting fine-grained parallelism through a combination of hardware and software techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A parallel pipelined processor with conditional instruction execution

ACM SIGARCH Computer Architecture News - Symposium on parallel algorithms and architectures
The effect of employing advanced branching mechanisms in superscalar processors

ACM SIGARCH Computer Architecture News
DSNS (dynamically-hazard-resolved statically-code-scheduled, nonuniform superscalar): yet another superscalar processor architecture

ACM SIGARCH Computer Architecture News
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Software support for speculative loads

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Sentinel scheduling for VLIW and superscalar processors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Efficient superscalar performance through boosting

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Exploiting instruction-level parallelism with the conjugate register file scheme

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting instruction-level parallelism: the multithreaded approach

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Performance evaluation of instruction scheduling on the IBM RISC System/6000

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Performance analysis and design methodology for a scalable superscalar architecture

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
Speculative execution and branch prediction on parallel machines

ICS '93 Proceedings of the 7th international conference on Supercomputing
Enhanced superscalar hardware: the schedule table

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The anatomy of the register file in a multiscalar processor

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Unconstrained speculative execution with predicated state buffering

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A macrotask-level unlimited speculative execution on multiprocessors

ICS '95 Proceedings of the 9th international conference on Supercomputing
Using predicated execution to improve the performance of a dynamically scheduled machine with speculative execution

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Performance comparison of ILP machines with cycle time evaluation

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The 16-fold way: a microparallel taxonomy

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Control flow prediction for dynamic ILP processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Speculative execution exception recovery using write-back suppression

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Evaluation of scheduling techniques on a SPARC-based VLIW testbed

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A decision procedure for bit-vector arithmetic

DAC '98 Proceedings of the 35th annual Design Automation Conference
Integrated predicated and speculative execution in the IMPACT EPIC architecture

Proceedings of the 25th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors

25 years of the international symposia on Computer architecture (selected papers)
Partial method compilation using dynamic profile information

OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
On the Boosting of Instruction Scheduling by Renaming

The Journal of Supercomputing
Toward Advanced Parallel Processing: Exploiting Parallelism at Task and Instruction Levels

IEEE Micro
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors

IEEE Transactions on Computers
Three Architectural Models for Compiler-Controlled Speculative Execution

IEEE Transactions on Computers
Efficient Exploitation of Instruction-Level Parallelism for Superscalar Processors by the Conjugate Register File Scheme

IEEE Transactions on Computers
Optimizing Java Programs in the Presence of Exceptions

ECOOP '00 Proceedings of the 14th European Conference on Object-Oriented Programming
Eliminating Exception Constraints of Java Programs for IA-64

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Improved Implementations of the Speculative Memory Access Mechanism specMEM

IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Compiler orchestrated prefetching via speculation and predication

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Application driven embedded system design: a face recognition case study

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper describes a superscalar processor that combines the best qualities of static and dynamic instruction scheduling to increase the performance of non-numerical applications. The architecture performs all instruction scheduling statically to take advantage of the compiler's ability to efficiently schedule operations across many basic blocks. Since the conditional branches in non-numerical code are highly data dependent, the architecture introduces the concept of boosted instructions, instructions that are committed conditionally upon the result of later branch instructions. Boosting effectively removes the dependencies caused by branches and makes the scheduling of side-effect instructions as simple as those that are side-effect free. For efficiency, boosting is supported in the hardware by shadow structures that temporarily hold the side effects of boosted instructions until the conditional branches that the boosted instructions depend upon are executed. When the branch condition is determined, the buffered side effects are either committed or squashed. The limited static scheduler in our evaluation system shows that a 1.6-times speedup over scalar code is achievable by boosting instructions above only a single conditional branch. This performance is similar to the performance of a pure dynamic scheduler.