A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines

Authors:
Srihari Cadambi;Igor Durdanovic;Venkata Jakkula;Murugan Sankaradass;Eric Cosatto;Srimat Chakradhar;Hans Peter Graf
Affiliations:
-;-;-;-;-;-;-
Venue:
FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Year:
2009

Citing 0
Cited 4

Scalable effort hardware design: exploiting algorithmic resilience for energy efficiency

Proceedings of the 47th Design Automation Conference
A programmable parallel accelerator for learning and classification

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A low-energy computation platform for data-driven biomedical monitoring algorithms

Proceedings of the 48th Design Automation Conference
A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a massively parallel FPGA-based coprocessor for Support Vector Machines (SVMs), a machine learning algorithm whose applications include recognition tasks such as learning scenes, situations and concepts, and reasoning tasks such as analyzing the recognized scenes and semantics. The coprocessor architecture, targeted at both SVM training and classification, is based on clusters of vector processing elements (VPEs) operating in single-instruction multiple data (SIMD) mode to take advantage of large amounts of data parallelism in the application. We use the FPGA’s DSP elements as parallel multiply-accumulators (MACs), a core computation in SVMs. A key feature of the architecture is that it is customized to low precision arithmetic which permits one DSP unit to perform two or more MACs in parallel. Low precision also reduces the required number of parallel off-chip memory accesses by packing multiple data words on the FPGA-memory bus. We have built a prototype using an off-the-shelf PCI-based FPGA card with a Xilinx Virtex 5 FPGA and 1GB DDR2 memory. For SVM training, we observe application-level end-to-end computation speeds of over 9 billion multiply-accumulates per second (GMACs). For SVM classification, using data packing, the application speed increases to 14 GMACs. The FPGA-based system is about 20x faster than a dual Opteron 2.2 GHz processor CPU, and dissipates around 10W of power.