A Fully Pipelined FPGA Architecture of a Factored Restricted Boltzmann Machine Artificial Neural Network

Authors:
Lok-Won Kim;Sameh Asaad;Ralph Linsker
Affiliations:
Cisco Systems;IBM T. J. Watson Research Center;IBM T. J. Watson Research Center
Venue:
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Year:
2014

Citing 10
Cited 0

Finite Precision Error Analysis of Neural Network Hardware Implementations

IEEE Transactions on Computers
A fast learning algorithm for deep belief nets

Neural Computation
Large-scale deep unsupervised learning using graphics processors

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Factored conditional restricted Boltzmann Machines for modeling motion style

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Artificial neural networks: a review of commercial hardware

Engineering Applications of Artificial Intelligence
Learning to represent spatial transformations with factored higher-order boltzmann machines

Neural Computation
A Large-Scale Architecture for Restricted Boltzmann Machines

FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
High-performance reconfigurable hardware architecture for restricted Boltzmann machines

IEEE Transactions on Neural Networks
An efficient hardware architecture for a neural network activation function generator

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part III
FPGA implementation of a pulse density neural network with learning ability using simultaneous perturbation

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Artificial neural networks (ANNs) are a natural target for hardware acceleration by FPGAs and GPGPUs because commercial-scale applications can require days to weeks to train using CPUs, and the algorithms are highly parallelizable. Previous work on FPGAs has shown how hardware parallelism can be used to accelerate a “Restricted Boltzmann Machine” (RBM) ANN algorithm, and how to distribute computation across multiple FPGAs. Here we describe a fully pipelined parallel architecture that exploits “mini-batch” training (combining many input cases to compute each set of weight updates) to further accelerate ANN training. We implement on an FPGA, for the first time to our knowledge, a more powerful variant of the basic RBM, the “Factored RBM” (fRBM). The fRBM has proved valuable in learning transformations and in discovering features that are present across multiple types of input. We obtain (in simulation) a 100-fold acceleration (vs. CPU software) for an fRBM having N = 256 units in each of its four groups (two input, one output, one intermediate group of units) running on a Virtex-6 LX760 FPGA. Many of the architectural features we implement are applicable not only to fRBMs, but to basic RBMs and other ANN algorithms more broadly.