FPGA-based hardware acceleration for local complexity analysis of massive genomic data

  • Authors:
  • Agathoklis Papadopoulos;Ioannis Kirmitzoglou;Vasilis J. Promponas;Theocharis Theocharides

  • Affiliations:
  • KIOS Research Center, Department of Electrical and Computer Engineering, University of Cyprus, 75 Kallipoleos Street, P.O. Box 20537, Nicosia 1678, Cyprus;Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, 75 Kallipoleos Street, P.O. Box 20537, Nicosia 1678, Cyprus;Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, 75 Kallipoleos Street, P.O. Box 20537, Nicosia 1678, Cyprus;KIOS Research Center, Department of Electrical and Computer Engineering, University of Cyprus, 75 Kallipoleos Street, P.O. Box 20537, Nicosia 1678, Cyprus

  • Venue:
  • Integration, the VLSI Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

While genomics have significantly advanced modern biological achievements, it requires extensive computational power, traditionally employed on large-scale cluster machines as well as multi-core systems. However, emerging research results show that FPGA-based acceleration of algorithms for genomic applications greatly improves the performance and energy efficiency when compared to multi-core systems and clusters. In this work, we present a parallel, hardware acceleration architecture of the CAST (Complexity Analysis of Sequence Tracts) algorithm, employed by biologists for complexity analysis of protein sequences encoded in genomic data. CAST is used for detecting (and subsequently masking) low-complexity regions (LCRs) in protein sequences. We designed and implemented the CAST accelerator architecture and built an FPGA prototype, with the purpose of benchmarking its performance against serial and multithreaded implementations of the CAST algorithm in software. The proposed architecture achieves remarkable speedup compared to both serial and multithreaded software CAST implementations ranging from approx. 100x-5000x, depending on the system configuration and the dataset features, such as low-complexity content and sequence length distribution. Such performance may enable complex analyses of voluminous sequence datasets, and has the potential to interoperate with other hardware architectures for protein sequence analysis.