A Reconfigurable Low-Power High-Performance Matrix Multiplier Design

  • Authors:
  • Rong Lin

  • Affiliations:
  • -

  • Venue:
  • ISQED '00 Proceedings of the 1st International Symposium on Quality of Electronic Design
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

A novel reconfiguable low-power high-performance matrix multiplier architecture and its component circuits are presented. The processor can be easily reconfigured to compute the product of matrices Xnk and Ykm for any integers n, k, m and any item precision b (ranging from 4 to 64 bits) thus maximizing the utilization of the hardware available.As a typical example, the hardware equivalent to one 64 x 64 bit high precision multiplier in the system can be directly reconfigured to produce the product of two matrices X(8x8) and Y(8x8) of 8-bit items in 9 pipeline cycles, which would require 512 multiplications (done by large multipliers) in a non-reconfigurable high precision system.Given an input stream of h x h matrix pairs with b-bit items, the processor, called matrix multiplier of size s (note s=hb), may consist of an array of (s / m)2 of m x m small multipliers (m=4 case is illustrated), a few arrays of adders each adding three numbers, an array of accumulators and corresponding simple reconfiguration switches. To compute the product of Xnk and Ykm of item precision b on the proposed processor of size s we only need to partition Xnk and Ykm into (s/b) x (s/b) sub-matrices, reconfigure the processor according to the values of s (fixed) and b (input parameter), compute the products of sub-matrices, and accumulate them for the desired result in pipelined fashion.A recently proposed shift switch logic, a non-binary logic for arithmetic circuits, is utilized in the design. The novel logic operates 4-bit state signals where no more than half of the signal bits are subject to value-change at any logic stage, which, verified by SPICE simulation, significantly reduces the large circuit power dissipation while keeping high performance in speed and small VLSI area.