Unconstrained speculative execution with predicated state buffering

  • Authors:
  • Hideki Ando;Chikako Nakanishi;Tetsuya Hara;Masao Nakaya

  • Affiliations:
  • System LSI Laboratory, Mitsubishi Electric Corporation, 4-1 Mizuhara, Itami, Hyogo, 664 Japan;System LSI Laboratory, Mitsubishi Electric Corporation, 4-1 Mizuhara, Itami, Hyogo, 664 Japan;System LSI Laboratory, Mitsubishi Electric Corporation, 4-1 Mizuhara, Itami, Hyogo, 664 Japan;System LSI Laboratory, Mitsubishi Electric Corporation, 4-1 Mizuhara, Itami, Hyogo, 664 Japan

  • Venue:
  • ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speculative execution is execution of instructions before it is known whether these instructions should be executed. Compiler-based speculative execution has the potential to achieve both a high instruction per cycle rate and high clock rate. Pure compiler-based approaches, however, have greatly limited instruction scheduling due to a limited ability to handle side effects of speculative execution. Significant performance improvement is, thus, difficult in non-numerical applications. This paper proposes a new architectural mechanism, called predicating, which provides unconstrained speculative execution. Predicating removes restrictions which limit the compiler's ability to schedule instructions. Through our hardware support, the compiler is allowed to move instructions past multiple basic block boundaries from any succeeding control path. Predicating buffers the side effects of speculative execution with its predicate, and the buffered predicate efficiently commits or squashes the side effects. The mechanism also provides a speculative exception handling scheme. The scheme, called the future condition, properly postpones speculative exceptions and efficiently restarts the process. We show that our mechanism can be implemented through a modest amount of hardware with little complexity. The evaluation results show that our mechanism significantly improves performance, and achieves a 2.45x speedup over scalar machines.