Superspeculative Microarchitecture for Beyond AD 2000

Authors:
Mikko H. Lipasti;John Paul Shen
Affiliations:
-;-
Venue:
Computer
Year:
1997

Citing 11
Cited 17

Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The intrinsic bandwidth requirements of ordinary programs

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Eliminating operand read latency

ACM SIGARCH Computer Architecture News
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Proceedings of the 24th annual international symposium on Computer architecture
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Value locality and speculative execution

Value locality and speculative execution
The Performance Potential of Value and Dependence Prediction

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing

Pipeline gating: speculation control for energy reduction

Proceedings of the 25th annual international symposium on Computer architecture
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Decoupling local variable accesses in a wide-issue superscalar processor

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Access region locality for high-bandwidth processor memory system design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Limits of Data Value Predictability

International Journal of Parallel Programming
Extending Value Reuse to Basic Blocks with Compiler Support

IEEE Transactions on Computers
A High-Bandwidth Memory Pipeline for Wide Issue Processors

IEEE Transactions on Computers
Computer Systems Research: The Pressure Is On

Computer
Realizing High IPC Using Time-Tagged Resource-Flow Computing

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Realizing high IPC through a scalable memory-latency tolerant multipath microarchitecture

ACM SIGARCH Computer Architecture News
Multiple-path execution for chip multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Billion-Transistor Architectures: There and Back Again

Computer
Supporting microthread scheduling and synchronisation in CMPs

International Journal of Parallel Programming
NoSQ: Store-Load Communication without a Store Queue

IEEE Micro
Access region cache with register guided memory reference partitioning

Journal of Systems Architecture: the EUROMICRO Journal
Effect of increasing chip density on the evolution of computer architectures

IBM Journal of Research and Development
Dynamic partition of memory reference instructions – a register guided approach

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	4.11

Visualization

Abstract

Based on their research at Carnegie Mellon University, these authors also argue for billion-transistor uniprocessors. Like Patt et al., they divide the important implementation problems into three components: instruction flow, register dataflow, and memory dataflow. They also argue for trace caches and advanced branch prediction. Their article, however, focuses on using massive speculation at all levels to improve performance. They claim that without this much speculation, future processors will be limited by true data dependences, and will be unable to harvest enough instruction-level parallelism (ILP) to improve performance satisfactorily. Their investigations discovered large speedups on code that have traditionally not been amenable to finding ILP.