Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Introduction to algorithms
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache
ICS '93 Proceedings of the 7th international conference on Supercomputing
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The network architecture of the connection machine CM-5
Journal of Parallel and Distributed Computing
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Proceedings of the 24th annual international symposium on Computer architecture
Improving trace cache effectiveness with branch promotion and trace packing
Proceedings of the 25th annual international symposium on Computer architecture
Digital systems engineering
Communications of the ACM - Special issue on computer architecture
The Counterflow Pipeline Processor Architecture
IEEE Design & Test
DAC '84 Proceedings of the 21st Design Automation Conference
Computational Aspects of VLSI
The VLSI Complexity of Sorting
IEEE Transactions on Computers
A comparison of scalable superscalar processors
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Circuits for wide-window superscalar processors
Proceedings of the 27th annual international symposium on Computer architecture
Optimal organizations for pipelined hierarchical memories
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
CRIB: consolidated rename, issue, and bypass
Proceedings of the 38th annual international symposium on Computer architecture
Hi-index | 0.00 |
The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular, the critical-path lengths of many components in existing implementations grow as T(n2) where n is the fetch width, the issue width, or the window size. This paper presents a novel implementation, called the Ultrascalar processor, that dramatically reduces the asymptotic critical-path length of a superscalar processor. The processor is implemented by a large collection of ALUs with controllers (together called execution stations) connected together by a network of parallel-prefix tree circuits. A fat-tree network connects an interleaved cache to the execution stations.