Fine-grained parallel RNA secondary structure prediction using SCFGs on FPGA

  • Authors:
  • Fei Xia;Yong Dou;Dan Zhou;Xin Li

  • Affiliations:
  • National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha 410073, China;National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha 410073, China;National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha 410073, China;National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha 410073, China

  • Venue:
  • Parallel Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the field of RNA secondary structure prediction, the CYK (Coche-Younger-Kasami) algorithm is one of the most popular methods using a SCFG (stochastic context-free grammar) model. Accelerating SCFGs for large models and large RNA database searching becomes a challenging task in computational bioinformatics because the parallel efficiency of general purpose computer systems is limited by the O (L^3) computational complexity and by complicated data dependences. Furthermore, large scale parallel computers are too expensive to be easily accessible to many research institutes. Recently, FPGA chips have emerged as one promising application accelerator to accelerate the CYK algorithm by exploiting a fine-grained custom design. We propose a systolic-like array structure including one master PE and multiple slave PEs for the fine-grained hardware implementation on FPGA to accelerate the CYK/inside algorithm with Query-Dependent Banding (QDB) heuristics. We partition the tasks by columns and assign them to PEs for load balance. We exploit data reuse schemes to reduce the need to load matrices from external memory. The experimental results show a speedup factor of more than 14x over the Infernal-1.0 with QDB optimization for the alignment of a single long RNA sequence to a large CM model with thousands of states running on a PC platform with Intel Dual-core 2.5GHz CPU. The computational power of our accelerator is comparable to that of a PC cluster consisting of 16 Intel-Xeon 2.0GHz Quad CPUs for large-scale database alignment applications (cmsearch) with multiple input sequences, but the power consumption is only about 10% of that of the cluster.