Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based Decomposition

Authors:
Karthik Nagarajan;Brian Holland;Alan D. George;K. Clint Slatton;Herman Lam
Affiliations:
NSF Center for High-Performance Reconfigurable Computing (CHREC), Electrical and Computer Engineering Department, University of Florida, Gainesville, USA 32611;NSF Center for High-Performance Reconfigurable Computing (CHREC), Electrical and Computer Engineering Department, University of Florida, Gainesville, USA 32611;NSF Center for High-Performance Reconfigurable Computing (CHREC), Electrical and Computer Engineering Department, University of Florida, Gainesville, USA 32611;NSF Center for High-Performance Reconfigurable Computing (CHREC), Electrical and Computer Engineering Department, University of Florida, Gainesville, USA 32611;NSF Center for High-Performance Reconfigurable Computing (CHREC), Electrical and Computer Engineering Department, University of Florida, Gainesville, USA 32611
Venue:
Journal of Signal Processing Systems
Year:
2011

Citing 16
Cited 3

The fast Gauss transform

SIAM Journal on Scientific and Statistical Computing
Design patterns: elements of reusable object-oriented software

Design patterns: elements of reusable object-oriented software
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Applications of Video-Content Analysis and Retrieval

IEEE MultiMedia
Generating Parallel Programs from the Wavefront Design Pattern

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Hidden Markov modeling and fuzzy controllers in FPGAs

FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Design Patterns for Reconfigurable Computing

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP '05 Proceedings of the Seventh International Workshop on Computer Architecture for Machine Perception
An Analysis of the Double-Precision Floating-Point FFT on FPGAs

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
N-Dimensional Probablility Density Function Transfer and its Application to Colour Transfer

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Cross-modal correlation learning for clustering on image-audio dataset

Proceedings of the 15th international conference on Multimedia
Real-time online multimedia content processing: mobile video optical character recognition and speech synthesizer for the visual impaired

Proceedings of the 1st international convention on Rehabilitation engineering & assistive technology: in conjunction with 1st Tan Tock Seng Hospital Neurorehabilitation Meeting
RAT: a methodology for predicting performance in application design migration to FPGAs

HPRCTA '07 Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07
Machine Learning for Audio, Image and Video Analysis: Theory and Applications (Advanced Information and Knowledge Processing)

Machine Learning for Audio, Image and Video Analysis: Theory and Applications (Advanced Information and Knowledge Processing)
Integrated Image and Speech Analysis for Content-Based Video Indexing

ICMCS '96 Proceedings of the 1996 International Conference on Multimedia Computing and Systems
Multidimensional probability density function approximations fordetection, classification, and model order selection

IEEE Transactions on Signal Processing

An analytical model for multilevel performance prediction of Multi-FPGA systems

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Self-Alignment Schemes for the Implementation of Addition-Related Floating-Point Operators

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Scalable multi-access flash store for big data analytics

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine-learning algorithms are employed in a wide variety of applications to extract useful information from data sets, and many are known to suffer from super-linear increases in computational time with increasing data size and number of signals being processed (data dimension). Certain principal machine-learning algorithms are commonly found embedded in larger detection, estimation, or classification operations. Three such principal algorithms are the Parzen window-based, non-parametric estimation of Probability Density Functions (PDFs), K-means clustering and correlation. Because they form an integral part of numerous machine-learning applications, fast and efficient execution of these algorithms is extremely desirable. FPGA-based reconfigurable computing (RC) has been successfully used to accelerate computationally intensive problems in a wide variety of scientific domains to achieve speedup over traditional software implementations. However, this potential benefit is quite often not fully realized because creating efficient FPGA designs is generally carried out in a laborious, case-specific manner requiring a great amount of redundant time and effort. In this paper, an approach using pattern-based decomposition for algorithm acceleration on FPGAs is proposed that offers significant increases in productivity via design reusability. Using this approach, we design, analyze, and implement a multi-dimensional PDF estimation algorithm using Gaussian kernels on FPGAs. First, the algorithm's amenability to a hardware paradigm and expected speedups are predicted. After implementation, actual speedup and performance metrics are compared to the predictions, showing speedup on the order of 20脳 over a 3.2 GHz processor. Multi-core architectures are developed to further improve performance by scaling the design. Portability of the hardware design across multiple FPGA platforms is also analyzed. After implementing the PDF algorithm, the value of pattern-based decomposition to support reuse is demonstrated by rapid development of the K-means and correlation algorithms.