Generalization of a Suffix Tree for RNA Structural Pattern Matching

Authors:
Tetsuo Shibuya
Affiliations:
IBM Tokyo Research Laboratory, 1623-14 Shimotsuruma, Yamato-shi, Kanagawa 242-8502, Japan
Venue:
Algorithmica
Year:
2004

Citing 0
Cited 13

Counting Parameterized Border Arrays for a Binary Alphabet

LATA '09 Proceedings of the 3rd International Conference on Language and Automata Theory and Applications
On-Line Construction of Parameterized Suffix Trees

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Geometric suffix tree: Indexing protein 3-D structures

Journal of the ACM (JACM)
On-line construction of parameterized suffix trees for large alphabets

Information Processing Letters
Verifying and enumerating parameterized border arrays

Theoretical Computer Science
Geometric suffix tree: a new index structure for protein 3-d structures

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Parameterized longest previous factor

IWOCA'11 Proceedings of the 22nd international conference on Combinatorial Algorithms
p-Suffix sorting as arithmetic coding

IWOCA'11 Proceedings of the 22nd international conference on Combinatorial Algorithms
Parameterized longest previous factor

Theoretical Computer Science
Variations of the parameterized longest previous factor

Journal of Discrete Algorithms
p-Suffix sorting as arithmetic coding

Journal of Discrete Algorithms
The Forward Stem Matrix: An Efficient Data Structure for Finding Hairpins in RNA Secondary Structures

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
The structural border array

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.01

Visualization

Abstract

In molecular biology, it is said that two biological sequences tend to have similar properties if they have similar three-dimensional structures. Hence, it is very important to find not only similar sequences in the string sense, but also structurally similar sequences from databases. In this paper we propose a new data structure that is a generalization of a parameterized suffix tree (p-suffix tree for short) introduced by Baker. We call it the structural suffix tree or s-suffix tree for short. The s-suffix tree can be used for finding structurally related patterns of RNA or single-stranded DNA. Furthermore, we propose an O(n(log|Σ| + log|Π|)) on-line algorithm for constructing it, where n is the sequence length, |Σ| is the size of the normal alphabet, and |Π| is that of the alphabet called “parameter,” which is related to the structure of the sequence. Our algorithm achieves linear time when it is used to analyze RNA and DNA sequences. Furthermore, as an algorithm for constructing the p-suffix tree, it is the first on-line algorithm, though the computing bound of our algorithm is the same as that of Kosaraju’s best-known algorithm. The results of computational experiments using actual RNA and DNA sequences are also given to demonstrate our algorithm’s practicality.