Register port complexity reduction in wide-issue processors with selective instruction execution

  • Authors:
  • Rama Sangireddy

  • Affiliations:
  • Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75080, USA

  • Venue:
  • Microprocessors & Microsystems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the width of the processor grows, complexity of a register file (RF) with multiple ports grows more than linearly and leads to larger register access time and higher power consumption. Analysis of SPEC2000 programs reveals that only a small portion of the instructions in a program (16% in integer and 38% in floating-point) require both the source operands. Also, when the programs are executed in an 8-wide processor only a very few (two or less) two-source instructions are executed in a cycle for a significant portion of time (more than 98% for integer and 93% for floating-point), leading to a significant under-utilization of register port bandwidth. In this paper, we propose a novel technique to significantly reduce the number of register ports, with a very minor modification in the select logic to issue only a limited number of two-source instructions each cycle. This is achieved with no significant impact on processor's overall performance. The novelty of the technique is that it is easy to implement and succeeds in reducing the access time, power, and area of the register file, without aggravating these factors in any other logic on the chip. With this technique in an 8-wide processor, as compared to a conventional 128-entry RF with 16 read ports, for integer programs a register file can be designed with 11 or 10 read ports as these configurations result in instructions per cycle (IPC) degradation of only 0.929% and 3.38%, respectively. This significantly low degradation in IPC is achieved while reducing the register access time by 9% and 12%, respectively, and reducing power by 35% and 50%, respectively. For FP programs, a register file can be designed with 12 read ports (1.16% IPC loss, 8% less access time, and 28% less power) or with 11 read ports (3.5% IPC loss, 9% less access time, and 35% less power). The paper analyzes the performance of all the possible flavors of the proposed technique for register file in both 4-wide and 8-wide processors, and presents a choice of the performance and register port complexity combination to the designer.