A Comparison of FPGA Implementations of Bit-Level and Word-Level Matrix Multipliers

Authors:
Radhika S. Grover;Weijia Shang;Qiang Li
Affiliations:
-;-;-
Venue:
FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
Year:
2000

Citing 4
Cited 0

Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

IEEE Transactions on Parallel and Distributed Systems
Conflict-Free Scheduling of Nested Loop Algorithms on Lower Dimensional Processor Arrays

IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
Dependence Analysis and Architecture Design for Bit-Level Algorithms

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have implemented a novel bit-level matrix multiplier on a Xilinx FPGA chip where each processing element does a simple operation of adding three to six bits to generate one partial sum bit and one to two carryout bits. The speedup over word-level is possible because individual bits of a word do not have to be processed as a unit in a bit-level architecture. It is shown in a previous work that bit-level architectures for fixed point applications can be O(log p) times faster than the corresponding word-level architecture where p is the word length. In this paper we implemented the bit-level matrix multiplier on a Xilinx FPGA chip that is compared to a word-level matrix multiplier composed of highly optimized multiplier and adder macros available in the Xilinx Core generator library. The architecture presented in this paper is even faster than previous ones by breaking the critical path in the dependence graph into half. Our results show that speedup by a factor of 2 can be obtained in practice.