Realizing high IPC through a scalable memory-latency tolerant multipath microarchitecture

Authors:
D. Morano;A. Khalafi;D. R. Kaeli;A. K. Uht
Affiliations:
Northeastern University;Northeastern University;Northeastern University;University of Rhode Island
Venue:
ACM SIGARCH Computer Architecture News
Year:
2003

Citing 12
Cited 0

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Disjoint eager execution: an optimal form of speculative execution

Proceedings of the 28th annual international symposium on Microarchitecture
An empirical study of decentralized ILP execution models

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Circuits for wide-window superscalar processors

Proceedings of the 27th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Superspeculative Microarchitecture for Beyond AD 2000

Computer
Realizing High IPC Using Time-Tagged Resource-Flow Computing

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
The architecture of an optimistic CPU: the WarpEngine

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

A microarchitecture is described that achieves high performance on conventional single-threaded program codes without compiler assistance. To obtain high instructions per clock (IPC) for inherently sequential (e.g., SpecInt-2000 programs), a large number of instructions must be in flight simultaneously. However, several problems are associated with such microarchitectures, including scalability, issues related to control flow, and memory latency.Our design investigates how to utilize a large mesh of processing elements in order to execute a singlethreaded program. We present a basic overview of our microarchitecture and discuss how it addresses scalability as we attempt to execute many instructions in parallel. The microarchitecture makes use of control and value speculative execution, multipath execution, and a high degree of out-of-order execution to help extract instruction level parallelism. Execution-time predication and time-tags for operands are used for maintaining program order. We provide simulation results for several geometries of our microarchitecture illustrating a range of design tradeoffs. Results are also presented that show the small performance impact over a range of memory system latencies.