Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based Decomposition

  • Authors:
  • Karthik Nagarajan;Brian Holland;Alan D. George;K. Clint Slatton;Herman Lam

  • Affiliations:
  • NSF Center for High-Performance Reconfigurable Computing (CHREC), Electrical and Computer Engineering Department, University of Florida, Gainesville, USA 32611;NSF Center for High-Performance Reconfigurable Computing (CHREC), Electrical and Computer Engineering Department, University of Florida, Gainesville, USA 32611;NSF Center for High-Performance Reconfigurable Computing (CHREC), Electrical and Computer Engineering Department, University of Florida, Gainesville, USA 32611;NSF Center for High-Performance Reconfigurable Computing (CHREC), Electrical and Computer Engineering Department, University of Florida, Gainesville, USA 32611;NSF Center for High-Performance Reconfigurable Computing (CHREC), Electrical and Computer Engineering Department, University of Florida, Gainesville, USA 32611

  • Venue:
  • Journal of Signal Processing Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Machine-learning algorithms are employed in a wide variety of applications to extract useful information from data sets, and many are known to suffer from super-linear increases in computational time with increasing data size and number of signals being processed (data dimension). Certain principal machine-learning algorithms are commonly found embedded in larger detection, estimation, or classification operations. Three such principal algorithms are the Parzen window-based, non-parametric estimation of Probability Density Functions (PDFs), K-means clustering and correlation. Because they form an integral part of numerous machine-learning applications, fast and efficient execution of these algorithms is extremely desirable. FPGA-based reconfigurable computing (RC) has been successfully used to accelerate computationally intensive problems in a wide variety of scientific domains to achieve speedup over traditional software implementations. However, this potential benefit is quite often not fully realized because creating efficient FPGA designs is generally carried out in a laborious, case-specific manner requiring a great amount of redundant time and effort. In this paper, an approach using pattern-based decomposition for algorithm acceleration on FPGAs is proposed that offers significant increases in productivity via design reusability. Using this approach, we design, analyze, and implement a multi-dimensional PDF estimation algorithm using Gaussian kernels on FPGAs. First, the algorithm's amenability to a hardware paradigm and expected speedups are predicted. After implementation, actual speedup and performance metrics are compared to the predictions, showing speedup on the order of 20脳 over a 3.2 GHz processor. Multi-core architectures are developed to further improve performance by scaling the design. Portability of the hardware design across multiple FPGA platforms is also analyzed. After implementing the PDF algorithm, the value of pattern-based decomposition to support reuse is demonstrated by rapid development of the K-means and correlation algorithms.