The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture

Authors:
Dana S. Henry;Bradley C. Kuszmaul;Vinod Viswanath
Affiliations:
-;-;-
Venue:
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Year:
1999

Citing 18
Cited 4

Fat-trees: universal networks for hardware-efficient supercomputing

IEEE Transactions on Computers
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Introduction to algorithms

Introduction to algorithms
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The network architecture of the connection machine CM-5

Journal of Parallel and Distributed Computing
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
DataScalar architectures

Proceedings of the 24th annual international symposium on Computer architecture
Improving trace cache effectiveness with branch promotion and trace packing

Proceedings of the 25th annual international symposium on Computer architecture
Digital systems engineering

Digital systems engineering
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
One Billion Transistors, One Uniprocessor, One Chip

Computer
The Counterflow Pipeline Processor Architecture

IEEE Design & Test
Magic: A VLSI layout system

DAC '84 Proceedings of the 21st Design Automation Conference
Computational Aspects of VLSI

Computational Aspects of VLSI
The VLSI Complexity of Sorting

IEEE Transactions on Computers

A comparison of scalable superscalar processors

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Circuits for wide-window superscalar processors

Proceedings of the 27th annual international symposium on Computer architecture
Optimal organizations for pipelined hierarchical memories

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
CRIB: consolidated rename, issue, and bypass

Proceedings of the 38th annual international symposium on Computer architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular, the critical-path lengths of many components in existing implementations grow as T(n2) where n is the fetch width, the issue width, or the window size. This paper presents a novel implementation, called the Ultrascalar processor, that dramatically reduces the asymptotic critical-path length of a superscalar processor. The processor is implemented by a large collection of ALUs with controllers (together called execution stations) connected together by a network of parallel-prefix tree circuits. A fat-tree network connects an interleaved cache to the execution stations.