Circuits for wide-window superscalar processors

Authors:
Dana S. Henry;Bradley C. Kuszmaul;Gabriel H. Loh;Rahul Sami
Affiliations:
Yale University, Departments of Computer Science and Electrical Engineering;Yale University, Departments of Computer Science and Electrical Engineering;Yale University, Departments of Computer Science and Electrical Engineering;Yale University, Departments of Computer Science and Electrical Engineering
Venue:
Proceedings of the 27th annual international symposium on Computer architecture
Year:
2000

Citing 10
Cited 17

Principles of CMOS VLSI design: a systems perspective

Principles of CMOS VLSI design: a systems perspective
Introduction to algorithms

Introduction to algorithms
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Using the SimOS machine simulator to study complex computer systems

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A comparison of scalable superscalar processors

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
One Billion Transistors, One Uniprocessor, One Chip

Computer
The Alpha 21264 Microprocessor

IEEE Micro
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI

Reducing the complexity of the issue logic

ICS '01 Proceedings of the 15th international conference on Supercomputing
Execution history guided instruction prefetching

ICS '02 Proceedings of the 16th international conference on Supercomputing
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A high-speed dynamic instruction scheduling scheme for superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Improving the Precise Interrupt Mechanism of Software-Managed TLB Miss Handlers

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Realizing High IPC Using Time-Tagged Resource-Flow Computing

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Realizing high IPC through a scalable memory-latency tolerant multipath microarchitecture

ACM SIGARCH Computer Architecture News
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Design and Implementation of High-Performance Memory Systems for Future Packet Buffers

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Execution History Guided Instruction Prefetching

The Journal of Supercomputing
Combining compiler and runtime IPC predictions to reduce energy in next generation architectures

Proceedings of the 1st conference on Computing frontiers
An efficient wakeup design for energy reduction in high-performance superscalar processors

Proceedings of the 2nd conference on Computing frontiers
Organization and implementation of the register-renaming mapper for out-of-order IBM POWER4 processors

IBM Journal of Research and Development - Electrochemical technology in microelectronics
In-Line Interrupt Handling and Lock-Up Free Translation Lookaside Buffers (TLBs)

IEEE Transactions on Computers
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality

IEEE Transactions on Computers
Wake-up logic optimizations through selective match and wakeup range limitation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Our program benchmarks and simulations of novel circuits indicate that large-window processors are feasible. Using our redesigned superscalar components, a large-window processor implemented in today's technology can achieve an increase of 10-60% (geometric mean of 31%) in program speed compared to today's processors. The processor operates at clock speeds comparable to today's processors, but achieves significantly higher ILP.To measure the impact of a large window on clock speed, we design and simulate new implementations of the logic components that most limit the critical path of our large-window processor: the schedule logic and the wake-up logic. We use log-depth cyclic segmented prefix (CSP) circuits to reimplement these components. Our layouts and simulations of critical paths through these circuits indicate that our large-window processor could be clocked at frequencies exceeding 500MHz in today's technology. Our commit logic and rename logic can also run at these speeds.To measure the impact of a large window on ILP, we compare two microarchitectures, the first has a 128-instruction window, an 8-wide fetch unit, and 20-wide issue (four integer, branch, multiply, float, and memory units), whereas the second has a 32-instruction window, and a 4-wide fetch unit and is comparable to today's processors. For each, we simulate different window reuse and bypass policies. Our simulations show that the large-window processor achieves significantly higher IPC. This performance increase comes despite the fact that the large-window processor uses a wrap-around window while the small-window processor uses a compressing window, thus effectively increasing its number of outstanding instructions. Furthermore, the large-window processor sometimes pays an extra clock cycle for bypassing.