SIAM Journal on Scientific and Statistical Computing
Design patterns: elements of reusable object-oriented software
Design patterns: elements of reusable object-oriented software
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Applications of Video-Content Analysis and Retrieval
IEEE MultiMedia
Generating Parallel Programs from the Wavefront Design Pattern
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Hidden Markov modeling and fuzzy controllers in FPGAs
FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Design Patterns for Reconfigurable Computing
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data
CAMP '05 Proceedings of the Seventh International Workshop on Computer Architecture for Machine Perception
An Analysis of the Double-Precision Floating-Point FFT on FPGAs
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
N-Dimensional Probablility Density Function Transfer and its Application to Colour Transfer
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Cross-modal correlation learning for clustering on image-audio dataset
Proceedings of the 15th international conference on Multimedia
Proceedings of the 1st international convention on Rehabilitation engineering & assistive technology: in conjunction with 1st Tan Tock Seng Hospital Neurorehabilitation Meeting
RAT: a methodology for predicting performance in application design migration to FPGAs
HPRCTA '07 Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07
Machine Learning for Audio, Image and Video Analysis: Theory and Applications (Advanced Information and Knowledge Processing)
Integrated Image and Speech Analysis for Content-Based Video Indexing
ICMCS '96 Proceedings of the 1996 International Conference on Multimedia Computing and Systems
IEEE Transactions on Signal Processing
An analytical model for multilevel performance prediction of Multi-FPGA systems
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Self-Alignment Schemes for the Implementation of Addition-Related Floating-Point Operators
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Scalable multi-access flash store for big data analytics
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Hi-index | 0.00 |
Machine-learning algorithms are employed in a wide variety of applications to extract useful information from data sets, and many are known to suffer from super-linear increases in computational time with increasing data size and number of signals being processed (data dimension). Certain principal machine-learning algorithms are commonly found embedded in larger detection, estimation, or classification operations. Three such principal algorithms are the Parzen window-based, non-parametric estimation of Probability Density Functions (PDFs), K-means clustering and correlation. Because they form an integral part of numerous machine-learning applications, fast and efficient execution of these algorithms is extremely desirable. FPGA-based reconfigurable computing (RC) has been successfully used to accelerate computationally intensive problems in a wide variety of scientific domains to achieve speedup over traditional software implementations. However, this potential benefit is quite often not fully realized because creating efficient FPGA designs is generally carried out in a laborious, case-specific manner requiring a great amount of redundant time and effort. In this paper, an approach using pattern-based decomposition for algorithm acceleration on FPGAs is proposed that offers significant increases in productivity via design reusability. Using this approach, we design, analyze, and implement a multi-dimensional PDF estimation algorithm using Gaussian kernels on FPGAs. First, the algorithm's amenability to a hardware paradigm and expected speedups are predicted. After implementation, actual speedup and performance metrics are compared to the predictions, showing speedup on the order of 20脳 over a 3.2 GHz processor. Multi-core architectures are developed to further improve performance by scaling the design. Portability of the hardware design across multiple FPGA platforms is also analyzed. After implementing the PDF algorithm, the value of pattern-based decomposition to support reuse is demonstrated by rapid development of the K-means and correlation algorithms.