Efficient superscalar performance through boosting

Authors:
Michael D. Smith;Mark Horowitz;Monica S. Lam
Affiliations:
-;-;-
Venue:
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Year:
1992

Citing 21
Cited 34

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Highly concurrent scalar processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Checkpoint repair for out-of-order execution machines

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Code scheduling and register allocation in large basic blocks

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Available instruction-level parallelism for superscalar and superpipelined machines

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Region Scheduling: An Approach for Detecting and Redistributing Parallelism

IEEE Transactions on Software Engineering
Integrating register allocation and instruction scheduling for RISCs

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Global instruction scheduling for superscalar machines

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Exploiting fine-grained parallelism through a combination of hardware and software techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Code duplication: an assist for global instruction scheduling

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Sharlit—a tool for building optimizers

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A global resource-constrained parallelization technique

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Boosting beyond static scheduling in a superscalar processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Look-Ahead Processors

ACM Computing Surveys (CSUR)
Percolation Scheduling: A Parallel Compilation Technique

Percolation Scheduling: A Parallel Compilation Technique

Compiling real-time programs into schedulable code

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
Reducing indirect function call overhead in C++ programs

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Avoidance and suppression of compensation code in a trace scheduling compiler

ACM Transactions on Programming Languages and Systems (TOPLAS)
Speculative disambiguation: a compilation technique for dynamic memory disambiguation

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Global scheduling for high-level synthesis applications

DAC '94 Proceedings of the 31st annual Design Automation Conference
Unconstrained speculative execution with predicated state buffering

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A macrotask-level unlimited speculative execution on multiprocessors

ICS '95 Proceedings of the 9th international conference on Supercomputing
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Strategic directions in computer architecture

ACM Computing Surveys (CSUR) - Special ACM 50th-anniversary issue: strategic directions in computing research
A study on the number of memory ports in multiple instruction issue machines

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
The 16-fold way: a microparallel taxonomy

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Speculative execution exception recovery using write-back suppression

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Parallelizing nonnumerical code with selective scheduling and software pipelining

ACM Transactions on Programming Languages and Systems (TOPLAS)
The Stanford FLASH multiprocessor

25 years of the international symposia on Computer architecture (selected papers)
MPS: Miss-Path Scheduling for Multiple-Issue Processors

IEEE Transactions on Computers
A reordering technique for efficient code motion

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A code-motion pruning technique for global scheduling

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Efficient Exploitation of Instruction-Level Parallelism for Superscalar Processors by the Conjugate Register File Scheme

IEEE Transactions on Computers
Generalized Multiway Branch Unit for VLIW Microprocessors

IEEE Transactions on Parallel and Distributed Systems
Compiling Real-Time Programs With Timing Constraint Refinement and Structural Code Motion

IEEE Transactions on Software Engineering
Informationstechnik in der Lebenswelt

Informatik und Schule 1991, Informatik: Wege zur Vielfalt beim Lehren und Lernen
Resource Spackling: A Framework for Integrating Register Allocation in Local and Global Schedulers

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Selective Scheduling Framework for Speculative Operations in VLIW and Superscalar Processors

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
SST: Symbolic Subordinate Threading

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Reducing Instruction Fetch Cost by Packing Instructions into RegisterWindows

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Incremental Commit Groups for Non-Atomic Trace Processing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Optimal trace scheduling using enumeration

ACM Transactions on Architecture and Code Optimization (TACO)
A real system evaluation of hardware atomicity for software speculation

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

The foremost goal of superscalar processor design is to increase performance through the exploitation of instruction-level parallelism (ILP). Previous studies have shown that speculative execution is required for high instruction per cycle (IPC) rates in non-numerical applications. The general trend has been toward supporting speculative execution in complicated, dynamically-scheduled processors. Performance, though, is more than just a high IPC rate; it also depends upon instruction count and cycle time. Boosting is an architectural technique that supports general speculative execution in simpler, statically-scheduled processors. Boosting labels speculative instructions with their control dependence information. This labelling eliminates control dependence constraints on instruction scheduling while still providng full dependence information to the hardware. We have incorporated boosting into a trace-based, global scheduling algorithm that exploits ILP without adversely affecting the instruction count of a program. We use this algorithm and estimates of the boosting hardware involved to evaluate how much speculative execution support is really necessary to achieve good performance. We find that a statically-scheduled superscalar processor using a minimal implementation of boosting can easily reach the performance of a much more complex dynamically-scheduled superscalar processor.