Principles of CMOS VLSI design: a systems perspective
Principles of CMOS VLSI design: a systems perspective
Introduction to algorithms
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A comparison of scalable superscalar processors
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
The Alpha 21264 Microprocessor
IEEE Micro
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Reducing the complexity of the issue logic
ICS '01 Proceedings of the 15th international conference on Supercomputing
Execution history guided instruction prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A high-speed dynamic instruction scheduling scheme for superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Improving the Precise Interrupt Mechanism of Software-Managed TLB Miss Handlers
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Realizing High IPC Using Time-Tagged Resource-Flow Computing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Realizing high IPC through a scalable memory-latency tolerant multipath microarchitecture
ACM SIGARCH Computer Architecture News
Front-End Policies for Improved Issue Efficiency in SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Design and Implementation of High-Performance Memory Systems for Future Packet Buffers
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Execution History Guided Instruction Prefetching
The Journal of Supercomputing
Combining compiler and runtime IPC predictions to reduce energy in next generation architectures
Proceedings of the 1st conference on Computing frontiers
An efficient wakeup design for energy reduction in high-performance superscalar processors
Proceedings of the 2nd conference on Computing frontiers
IBM Journal of Research and Development - Electrochemical technology in microelectronics
In-Line Interrupt Handling and Lock-Up Free Translation Lookaside Buffers (TLBs)
IEEE Transactions on Computers
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality
IEEE Transactions on Computers
Wake-up logic optimizations through selective match and wakeup range limitation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.01 |
Our program benchmarks and simulations of novel circuits indicate that large-window processors are feasible. Using our redesigned superscalar components, a large-window processor implemented in today's technology can achieve an increase of 10-60% (geometric mean of 31%) in program speed compared to today's processors. The processor operates at clock speeds comparable to today's processors, but achieves significantly higher ILP.To measure the impact of a large window on clock speed, we design and simulate new implementations of the logic components that most limit the critical path of our large-window processor: the schedule logic and the wake-up logic. We use log-depth cyclic segmented prefix (CSP) circuits to reimplement these components. Our layouts and simulations of critical paths through these circuits indicate that our large-window processor could be clocked at frequencies exceeding 500MHz in today's technology. Our commit logic and rename logic can also run at these speeds.To measure the impact of a large window on ILP, we compare two microarchitectures, the first has a 128-instruction window, an 8-wide fetch unit, and 20-wide issue (four integer, branch, multiply, float, and memory units), whereas the second has a 32-instruction window, and a 4-wide fetch unit and is comparable to today's processors. For each, we simulate different window reuse and bypass policies. Our simulations show that the large-window processor achieves significantly higher IPC. This performance increase comes despite the fact that the large-window processor uses a wrap-around window while the small-window processor uses a compressing window, thus effectively increasing its number of outstanding instructions. Furthermore, the large-window processor sometimes pays an extra clock cycle for bypassing.