Optimized register renaming scheme for stack-based x86 operations

Authors:
Xuehai Qian;He Huang;Zhenzhong Duan;Junchao Zhang;Nan Yuan;Yongbin Zhou;Hao Zhang;Huimin Cui;Dongrui Fan
Affiliations:
Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences
Venue:
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Year:
2007

Citing 3
Cited 0

Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Microarchitecture of the Godson-2 processor

Journal of Computer Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The stack-based floating point unit (FPU) in the x86 architecture limits its floating point (FP) performance. The flat register file can improve FP performance but affect x86 compatibility. This paper presents an optimized two-phase floating point register renaming scheme used in implementing an x86-compliant processor. The two-phase renaming scheme eliminates the implicit dependencies between the consecutive FP instructions and redundant operations. As two applications of the method, the techniques used in the second phase of the scheme can eliminate redundant loads and reduce the mis-speculation ratio of the load-store queue. Moreover, the performance of a binary translation system that translates instructions in x86 to MIPS-like ISA can also be boosted by adding the related architectural supports in this optimized scheme to the architecture.