The Forward Stem Matrix: An Efficient Data Structure for Finding Hairpins in RNA Secondary Structures

  • Authors:
  • Richard Beal;Donald Adjeroh;Ahmed Abbasi

  • Affiliations:
  • West Virginia University, Morgantown, WV 26506;West Virginia University, Morgantown, WV 26506;University of Virginia, Charlottesville, VA 22903

  • Venue:
  • Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the rapid growth in available genomic data, robust and efficient methods for identifying RNA secondary structure elements, such as hairpins, have become a significant challenge in computational biology, with potential applications in prediction of RNA secondary and tertiary structures, functional classification of RNA structures, micro RNA target prediction, and discovery of RNA structure motifs. In this work, we propose the Forward Stem Matrix (FSM), a data structure to efficiently represent all k-length stem options, for k ∈ K, within an n-length RNA sequence T. We show that the FSM structure is of size O(n|K|) and still permits efficient access to stems. In this paper, we provide a linear O(n|K|) construction for the FSM using suffix arrays and data structures related to the Longest Previous Factor (LPF), namely, the Furthest Previous Non-Overlapping Factor (FPnF) and Furthest Previous Factor (FPF) arrays. We also provide new constructions for the FPnF and FPF via a novel application of parameterized string (p-string) theory and suffix trees. As an application of the FSM, we show how to efficiently find all hairpin structures in an RNA sequence. Experimental results show the practical performance of the proposed data structures.