A high-performance FPGA architecture for restricted boltzmann machines

  • Authors:
  • Daniel L. Ly;Paul Chow

  • Affiliations:
  • University of Toronto, Toronto, ON, Canada;University of Toronto, Toronto, ON, Canada

  • Venue:
  • Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications have been limited. A primary cause of this lack of adoption is due to the fact that neural networks are usually implemented as software running on general-purpose processors. Algorithms to implement a neural network in software are typically O(n2) problems -- as a result, neural networks are unable to provide the performance and scalability required in non-academic settings. In this paper, we investigate how FPGAs can be used to take advantage of the inherent parallelism in neural networks to provide a better implementation in terms of scalability and performance. We will focus on the Restricted Boltzmann machine, a popular type of neural network, because its architecture is particularly well-suited to hardware designs. The proposed, multi-purpose hardware framework is designed to reduce the O(n22) problem into an O(n) implementation while only requiring O(n) resources. The framework is tested on a Xilinx Virtex II-Pro XC2VP70 FPGA running at 100MHz. The resources support a Restricted Boltzmann machine of 128x128 nodes, which results in a computational speed of 1.02 billion connection-updates-per-second and a speed-up of 35 fold over an optimized C program running on a 2.8GHz Intel processor.