Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Parameterized pattern matching: algorithms and applications
Journal of Computer and System Sciences
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Combinatorics of RNA secondary structures
Discrete Applied Mathematics - Special volume on computational molecular biology DAM-CMB series volume 2
Introduction to Algorithms
Algorithms for pattern matching and discovery in RNA secondary structure
Theoretical Computer Science - Pattern discovery in the post genome
The affix array data structure and its applications to RNA secondary structure analysis
Theoretical Computer Science
Computing Longest Previous Factor in linear time and applications
Information Processing Letters
A Simple Algorithm for Computing the Lempel Ziv Factorization
DCC '08 Proceedings of the Data Compression Conference
The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching
The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching
IEEE Spectrum
Computing Longest Previous non-overlapping Factors
Information Processing Letters
Topology and prediction of RNA pseudoknots
Bioinformatics
Efficient algorithms for three variants of the LPF table
Journal of Discrete Algorithms
Parameterized longest previous factor
Theoretical Computer Science
Variations of the parameterized longest previous factor
Journal of Discrete Algorithms
Hi-index | 0.00 |
With the rapid growth in available genomic data, robust and efficient methods for identifying RNA secondary structure elements, such as hairpins, have become a significant challenge in computational biology, with potential applications in prediction of RNA secondary and tertiary structures, functional classification of RNA structures, micro RNA target prediction, and discovery of RNA structure motifs. In this work, we propose the Forward Stem Matrix (FSM), a data structure to efficiently represent all k-length stem options, for k ∈ K, within an n-length RNA sequence T. We show that the FSM structure is of size O(n|K|) and still permits efficient access to stems. In this paper, we provide a linear O(n|K|) construction for the FSM using suffix arrays and data structures related to the Longest Previous Factor (LPF), namely, the Furthest Previous Non-Overlapping Factor (FPnF) and Furthest Previous Factor (FPF) arrays. We also provide new constructions for the FPnF and FPF via a novel application of parameterized string (p-string) theory and suffix trees. As an application of the FSM, we show how to efficiently find all hairpin structures in an RNA sequence. Experimental results show the practical performance of the proposed data structures.