Efficient instruction scheduling for a pipelined architecture
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Global register allocation at link time
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
On the use of registers vs. cache to minimize memory traffic
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Highly concurrent scalar processing
Highly concurrent scalar processing
Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
PIPE: a VLSI decoupled architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Implementation of precise interrupts in pipelined processors
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Postpass Code Optimization of Pipeline Constraints
ACM Transactions on Programming Languages and Systems (TOPLAS)
An analysis of inline substitution for a structured programming language
Communications of the ACM
Inline routines in VAXELN Pascal
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
The Design of an Optimizing Compiler
The Design of an Optimizing Compiler
Hardware/software tradeoffs for increased performance
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Register allocation for free: The C machine stack cache
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
RISC I: A Reduced Instruction Set VLSI Computer
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Register allocation and code scheduling for load/store architectures
Register allocation and code scheduling for load/store architectures
Instruction issue logic for high-performance, interruptable pipelined processors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
IEEE Transactions on Computers
Exploiting short-lived variables in superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
Instruction issue logic for high-performance, interruptable pipelined processors
25 years of the international symposia on Computer architecture (selected papers)
Instruction fetch unit for parallel execution of branch instructions
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Efficient Instruction Sequencing with Inline Target Insertion
IEEE Transactions on Computers
Hi-index | 0.01 |
In this paper, the WISQ architecture is described. This architecture is designed to achieve high performance by exploiting new compiler technology and using a highly segmented pipeline. By having a highly segmented pipeline, a very-high-speed clock can be used. Since a highly segmented pipeline will require relatively long pipelines, a way must be provided to minimize the effects of pipeline bubbles that are formed due to data and control dependencies. It is also important to provide a way of supporting precise interrupts. These goals are met, in part, by providing a reorder buffer to help restore the machine to a precise state. The architecture then makes the pipelining visible to the programmer/compiler by making the reorder buffer accessible and by explicitly providing that issued instructions cannot be affected by immediately preceding ones. Compiler techniques have been identified that can take advantage of the reorder buffer and permit a sustained execution rate approaching or exceeding one per clock. These techniques include using trace scheduling and providing a relatively easy way to “undo” instructions if the predicted branch path is not taken. We have also studied ways to further reduce the effects of branches by not having them executed in the execution unit. In particular, branches are detected and resolved in the instruction fetch unit. Using this approach, the execution unit is sent a stream of instructions (without branches) that are guaranteed to execute.