High-performance reconfigurable hardware architecture for restricted Boltzmann machines

  • Authors:
  • Daniel Le Ly;Paul Chow

  • Affiliations:
  • Department of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY and Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada;Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada

  • Venue:
  • IEEE Transactions on Neural Networks
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can exploit the inherent parallelism in neural networks is desired. This paper investigates how the restricted Boltzmann machine (RBM), which is a popular type of neural network, can be mapped to a high-performance hardware architecture on field-programmable gate array (FPGA) platforms. The proposed modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. A method to partition large RBMs into smaller congruent components is also presented, allowing the distribution of one RBM across multiple FPGA resources. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100 MHz through a variety of different configurations. The maximum performance was obtained by instantiating an RBM of 256 × 256 nodes distributed across four FPGAs, which resulted in a computational speed of 3.13 billion connection-updates-per-second and a speedup of 145- fold over an optimized C program running on a 2.8-GHz Intel processor.