Multilayer feedforward networks are universal approximators
Neural Networks
Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
SoC-Based Implementation of the Backpropagation Algorithm for MLP
HIS '08 Proceedings of the 2008 8th International Conference on Hybrid Intelligent Systems
Computation of a nonlinear squashing function in digital neural networks
DDECS '08 Proceedings of the 2008 11th IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems
License Plate Recognition From Still Images and Video Sequences: A Survey
IEEE Transactions on Intelligent Transportation Systems
The Impact of Arithmetic Representation on Implementing MLP-BP on FPGAs: A Study
IEEE Transactions on Neural Networks
A scalable pipelined architecture for real-time computation of MLP-BP neural networks
Microprocessors & Microsystems
Hi-index | 0.00 |
FPGAs offer a promising platform for the implementation of Artificial Neural Networks (ANNs) and their training, combining the use of custom optimized hardware with low cost and fast development time. However, purely hardware realizations tend to focus on throughput, resorting to restrictions on applicable network topology or low-precision data representation, whereas flexible solutions allowing a wide variation of network parameters and training algorithms are usually restricted to software implementations. This paper proposes a mixed approach, introducing a system-on-chip (SoC) implementation where computations are carried out by a high efficiency neural coprocessor with a large number of parallel processing elements. System flexibility is provided by on-chip software control and the use of floating-point arithmetic, and network parallelism is exploited through replicated logic and application-specific coprocessor architecture, leading to fast training time. Performance results and design limitations and trade-offs are discussed.