A unified systolic architecture for artificial neural networks
Journal of Parallel and Distributed Computing - Neural Computing
Biological Cybernetics
Neural networks: applications in industry, business and science
Communications of the ACM
Stochastic Neural Computation I: Computational Elements
IEEE Transactions on Computers
Computer Arithmetic Algorithms
Computer Arithmetic Algorithms
Finite Precision Error Analysis of Neural Network Hardware Implementations
IEEE Transactions on Computers
Evolving neural networks through augmenting topologies
Evolutionary Computation
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
An efficient hardware architecture for a neural network activation function generator
ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part III
Real-time neuroevolution in the NERO video game
IEEE Transactions on Evolutionary Computation
Implementation issues of neuro-fuzzy hardware: going toward HW/SW codesign
IEEE Transactions on Neural Networks
A defect-tolerant accelerator for emerging high-performance applications
Proceedings of the 39th Annual International Symposium on Computer Architecture
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
This paper addresses the problem of accelerating large artificial neural networks (ANN), whose topology and weights can evolve via the use of a genetic algorithm. The proposed digital hardware architecture is capable of processing any evolved network topology, whilst at the same time providing a good trade off between throughput, area and power consumption. The latter is vital for a longer battery life on mobile devices. The architecture uses multiple parallel arithmetic units in each processing element (PE). Memory partitioning and data caching are used to minimise the effects of PE pipeline stalling. A first order minimax polynomial approximation scheme, tuned via a genetic algorithm, is used for the activation function generator. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design.