High-performance reconfigurable hardware architecture for restricted Boltzmann machines

Authors:
Daniel Le Ly;Paul Chow
Affiliations:
Department of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY and Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada;Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada
Venue:
IEEE Transactions on Neural Networks
Year:
2010

Citing 10
Cited 4

Maximally equidistributed combined Tausworthe generators

Mathematics of Computation
BEE2: A High-End Reconfigurable Computing System

IEEE Design & Test
A fast learning algorithm for deep belief nets

Neural Computation
FPGA Implementations of Neural Networks

FPGA Implementations of Neural Networks
A high bit resolution FPGA implementation of a FNN with a new algorithm for the activation function

Neurocomputing
High-performance implementation of the level-3 BLAS

ACM Transactions on Mathematical Software (TOMS)
A high-performance FPGA architecture for restricted boltzmann machines

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Large-scale deep unsupervised learning using graphics processors

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Artificial neural networks: a review of commercial hardware

Engineering Applications of Artificial Intelligence
The Impact of Arithmetic Representation on Implementing MLP-BP on FPGAs: A Study

IEEE Transactions on Neural Networks

Building a multi-FPGA virtualized restricted boltzmann machine architecture using embedded MPI

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
GPU-accelerated restricted boltzmann machine for collaborative filtering

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Towards adaptive learning with improved convergence of deep belief networks on graphics processing units

Pattern Recognition
A Fully Pipelined FPGA Architecture of a Factored Restricted Boltzmann Machine Artificial Neural Network

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can exploit the inherent parallelism in neural networks is desired. This paper investigates how the restricted Boltzmann machine (RBM), which is a popular type of neural network, can be mapped to a high-performance hardware architecture on field-programmable gate array (FPGA) platforms. The proposed modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. A method to partition large RBMs into smaller congruent components is also presented, allowing the distribution of one RBM across multiple FPGA resources. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100 MHz through a variety of different configurations. The maximum performance was obtained by instantiating an RBM of 256 × 256 nodes distributed across four FPGAs, which resulted in a computational speed of 3.13 billion connection-updates-per-second and a speedup of 145- fold over an optimized C program running on a 2.8-GHz Intel processor.