Scalable selective re-execution for EDGE architectures

Authors:
Rajagopalan Desikan;Simha Sethumadhavan;Doug Burger;Stephen W. Keckler
Affiliations:
The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin
Venue:
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Year:
2004

Citing 22
Cited 4

Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Compiler-directed early load-address generation

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Monsoon: an explicit token-store architecture

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Measuring Experimental Error in Microprocessor Simulation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Will Physical Scalability Sabotage Performance Gains?

Computer
The Alpha 21264 Microprocessor

IEEE Micro
A Study of Control Independence in Superscalar Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Scaling to the End of Silicon with EDGE Architectures

Computer
Coherence decoupling: making use of incoherence

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Understanding Scheduling Replay Schemes

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Reducing Branch Misprediction Penalty via Selective Branch Recovery

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture

ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
RETCON: transactional repair without replay

Proceedings of the 37th annual international symposium on Computer architecture
Distributed replay protocol for distributed uniprocessors

Proceedings of the 26th ACM international conference on Supercomputing
Rapid, low-power loop execution in a network of functional units

Proceedings of the 17th Panhellenic Conference on Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pipeline flushes are becoming increasingly expensive in modern microprocessors with large instruction windows and deep pipelines. Selective re-execution is a technique that can reduce the penalty of mis-speculations by re-executing only instructions affected by the mis-speculation, instead of all instructions. In this paper we introduce a new selective re-execution mechanism that exploits the properties of a dataflow-like Explicit Data Graph Execution (EDGE) architecture to support efficient mis-speculation recovery, while scaling to window sizes of thousands of instructions with high performance. This distributed selective re-execution (DSRE) protocol permits multiple speculative waves of computation to be traversing a dataflow graph simultaneously, with a commit wave propagating behind them to ensure correct execution. We evaluate one application of this protocol to provide efficient recovery for load-store dependence speculation. Unlike traditional dataflow architectures which resorted to single-assignment memory semantics, the DSRE protocol combines dataflow execution with speculation to enable high performance and conventional sequential memory semantics. Our experiments show that the DSRE protocol results in an average 17% speedup over the best dependence predictor proposed to date, and obtains 82% of the performance possible with a perfect oracle directing the issue of loads.