Bit matrix multiplication in commodity processors

  • Authors:
  • Yedidya Hilewitz;Cedric Lauradoux;Ruby B. Lee

  • Affiliations:
  • Department of Electrical Engineering, Princeton University, NJ08540, USA;Department of Electrical Engineering, Princeton University, NJ08540, USA;Department of Electrical Engineering, Princeton University, NJ08540, USA

  • Venue:
  • ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Registers in processors generally contain words or, with the addition of multimedia extensions, short vectors of subwords of bytes or 16-bit elements. In this paper, we view the contents of registers as vectors or matrices of individual bits. However, the facility to operate efficiently on the bit-level is generally lacking. A commodity processor usually only has logical and shift instructions and occasionally population count instructions. Perhaps the most powerful primitive bit-level operation is the bit matrix multiply (BMM) instruction, currently found only in supercomputers like Cray. This instruction multiplies two ntimesn bit matrices. In this paper, we show the power of BMM. We propose and analyze new processor instructions that implement simpler BMM primitive operations more suitable for a commodity processor. We show the impact of BMM on the performance of critical application kernels and discuss its hardware cost.