A mixed hardware-software approach to flexible artificial neural network training on FPGA

  • Authors:
  • Ramón J. Aliaga;Rafael Gadea;Ricardo J. Colom;Joaquín Cerdá;Néstor Ferrando;Vicente Herrero

  • Affiliations:
  • Institute for the Implementation of Advanced Information and Communication Technology, Universidad Politécnica de Valencia, Valencia, Spain;Institute for the Implementation of Advanced Information and Communication Technology, Universidad Politécnica de Valencia, Valencia, Spain;Institute for the Implementation of Advanced Information and Communication Technology, Universidad Politécnica de Valencia, Valencia, Spain;Institute for the Implementation of Advanced Information and Communication Technology, Universidad Politécnica de Valencia, Valencia, Spain;Institute for the Implementation of Advanced Information and Communication Technology, Universidad Politécnica de Valencia, Valencia, Spain;Institute for the Implementation of Advanced Information and Communication Technology, Universidad Politécnica de Valencia, Valencia, Spain

  • Venue:
  • SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

FPGAs offer a promising platform for the implementation of Artificial Neural Networks (ANNs) and their training, combining the use of custom optimized hardware with low cost and fast development time. However, purely hardware realizations tend to focus on throughput, resorting to restrictions on applicable network topology or low-precision data representation, whereas flexible solutions allowing a wide variation of network parameters and training algorithms are usually restricted to software implementations. This paper proposes a mixed approach, introducing a system-on-chip (SoC) implementation where computations are carried out by a high efficiency neural coprocessor with a large number of parallel processing elements. System flexibility is provided by on-chip software control and the use of floating-point arithmetic, and network parallelism is exploited through replicated logic and application-specific coprocessor architecture, leading to fast training time. Performance results and design limitations and trade-offs are discussed.