ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing

  • Authors:
  • Smruti R. Sarangi;Wei Liu, Josep Torrellas;Yuanyuan Zhou

  • Affiliations:
  • University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign

  • Venue:
  • Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

As more data value speculation mechanisms are being proposed to speed-up processors, there is growing pressure on the critical processor structures that must buffer the state of the speculative instructions. A scalable solution is to checkpoint the processor and retire speculative instructions. However, in this environment, misprediction recovery becomes very wasteful, as it involves discarding and re-executing all the instructions executed since the checkpoint. To speed-up execution in this environment, this paper presents a novel architecture (ReSlice) that selectively re-executes only the speculatively-retired instructions that directly depended on the mispredicted value, namely its Forward Slice. ReSlice buffers the (typically very few) instructions in the forward slice of the predicted value as such instructions initially execute. Then, potentially thousands of instructions later, ReSlice can quickly re-execute the slice if a misprediction is declared, and merge its state with the program state. In addition, this paper develops a sufficient condition for correct slice re-execution and merge. As one possible use of ReSlice, we apply it to recover from cross-task dependence violations in a chip multiprocessor with Thread-Level Speculation (TLS). ReSlice speeds up SpecInt applications over aggressive TLS by up to 33%, with a geometric mean of 12%. Moreover, E 脳 D2 decreases by 20%. All this is obtained by saving on average 61% of the task squashes through slice re-execution. On average, a slice re-executes only 6.6 instructions, compared to the 210 that would be re-executed on a squash.