Highly concurrent scalar processing

Authors:
P. Y T Hsu;E. S. Davidson
Affiliations:
IBM T. J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY;Computer Systems Group, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, 1101 W. Springfield Ave., Urbana, Illinols
Venue:
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Year:
1986

Citing 13
Cited 56

The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Guarded commands, nondeterminacy and formal derivation of programs

Communications of the ACM
The Architecture of Symbolic Computers

The Architecture of Symbolic Computers
Structure of Computers and Computations

Structure of Computers and Computations
MIPS: A microprocessor architecture

MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
Optimizing delayed branches

MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Measurement and analysis of instruction use in the VAX-11/780

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Decoupled access/execute computer architectures

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Hardware/software tradeoffs for increased performance

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
The 801 minicomputer

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
RISC I: A Reduced Instruction Set VLSI Computer

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture

Instruction issue logic for high-performance, interruptable pipelined processors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Reducing the Branch Penalty in Pipelined Processors

Computer
Operation scheduling in reconfigurable, multifunction pipelines

ACM SIGMICRO Newsletter
Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
High-bandwidth data memory systems for superscalar processors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Branch Strategies: Modeling and Optimization (Pipeline Processing)

IEEE Transactions on Computers
Sentinel scheduling for VLIW and superscalar processors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Efficient superscalar performance through boosting

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
Speculative disambiguation: a compilation technique for dynamic memory disambiguation

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Guarded execution and branch prediction in dynamic ILP processors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Height reduction of control recurrences for ILP processors

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Register file port requirements of transport triggered architectures

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Characterizing the impact of predicated execution on branch prediction

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Facilitating superscalar processing via a combined static/dynamic register renaming scheme

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining

ACM Computing Surveys (CSUR)
Unconstrained speculative execution with predicated state buffering

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Using predicated execution to improve the performance of a dynamically scheduled machine with speculative execution

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Critical path reduction for scalar programs

Proceedings of the 28th annual international symposium on Microarchitecture
A comparison of full and partial predicated execution support for ILP processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance comparison of ILP machines with cycle time evaluation

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Analysis techniques for predicated code

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Global predicate analysis and its application to register allocation

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Increasing the instruction fetch rate via block-structured instruction set architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
GPMB—software pipelining branch-intensive loops

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Control flow prediction for dynamic ILP processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A framework for balancing control flow and predication

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Resource-sensitive profile-directed data flow analysis for code optimization

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Integrated predicated and speculative execution in the IMPACT EPIC architecture

Proceedings of the 25th annual international symposium on Computer architecture
Instruction issue logic for high-performance, interruptable pipelined processors

25 years of the international symposia on Computer architecture (selected papers)
Multiscalar processors

25 years of the international symposia on Computer architecture (selected papers)
Task selection for a multiscalar processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
The program decision logic approach to predicated execution

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication

International Journal of Parallel Programming
Using profiling to reduce branch misprediction costs on a dynamically scheduled processor

Proceedings of the 14th international conference on Supercomputing
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
An integrated approach to accelerate data and predicate computations in hyperblocks

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures

International Journal of Parallel Programming
Efficient Instruction Sequencing with Inline Target Insertion

IEEE Transactions on Computers
Instruction Window Size Trade-Offs and Characterization of Program Parallelism

IEEE Transactions on Computers
Three Architectural Models for Compiler-Controlled Speculative Execution

IEEE Transactions on Computers
Efficient Exploitation of Instruction-Level Parallelism for Superscalar Processors by the Conjugate Register File Scheme

IEEE Transactions on Computers
Resource Spackling: A Framework for Integrating Register Allocation in Local and Global Schedulers

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
An Architectural Overview of the Programmable Multimedia Processor, TM-1

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Inter-Cluster Communication Models for Clustered VLIW Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Selective Guarded Execution Using Profiling on a Dynamically Scheduled Processor

IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Analyzing the Individual/Combined Effects of Speculative and Guarded Execution on a Superscalar Architecture

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Cluster assignment of global values for clustered VLIW processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set

Proceedings of the International Symposium on Code Generation and Optimization
PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Inter-cluster communication in VLIW architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Tree traversal scheduling: a global instruction scheduling technique for VLIW/EPIC processors

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing

Quantified Score

Hi-index	0.02

Visualization

Abstract

High speed scalar processing is an essential characteristic of high performance general purpose computer systems. Highly concurrent execution of scalar code is difficult due to data dependencies and conditional branches. This paper proposes an architectural concept called guarded instructions to reduce the penalty of conditional branches in deeply pipelined processors. A code generation heuristic, the decision tree scheduling technique, reorders instructions in a complex of basic blocks so as to make efficient use of guarded instructions. Performance evaluation of several benchmarks are presented, including a module from the UNIX kernel. Even with these difficult scalar code examples, a speedup of two is achievable by using conventional pipelined uniprocessors augmented by guard instructions, and a speedup of three or more can be achieved using processors with parallel instruction pipelines.