Optimized register renaming scheme for stack-based x86 operations

  • Authors:
  • Xuehai Qian;He Huang;Zhenzhong Duan;Junchao Zhang;Nan Yuan;Yongbin Zhou;Hao Zhang;Huimin Cui;Dongrui Fan

  • Affiliations:
  • Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences

  • Venue:
  • ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The stack-based floating point unit (FPU) in the x86 architecture limits its floating point (FP) performance. The flat register file can improve FP performance but affect x86 compatibility. This paper presents an optimized two-phase floating point register renaming scheme used in implementing an x86-compliant processor. The two-phase renaming scheme eliminates the implicit dependencies between the consecutive FP instructions and redundant operations. As two applications of the method, the techniques used in the second phase of the scheme can eliminate redundant loads and reduce the mis-speculation ratio of the load-store queue. Moreover, the performance of a binary translation system that translates instructions in x86 to MIPS-like ISA can also be boosted by adding the related architectural supports in this optimized scheme to the architecture.