Counting Parameterized Border Arrays for a Binary Alphabet
LATA '09 Proceedings of the 3rd International Conference on Language and Automata Theory and Applications
On-Line Construction of Parameterized Suffix Trees
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Geometric suffix tree: Indexing protein 3-D structures
Journal of the ACM (JACM)
On-line construction of parameterized suffix trees for large alphabets
Information Processing Letters
Verifying and enumerating parameterized border arrays
Theoretical Computer Science
Geometric suffix tree: a new index structure for protein 3-d structures
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Parameterized longest previous factor
IWOCA'11 Proceedings of the 22nd international conference on Combinatorial Algorithms
p-Suffix sorting as arithmetic coding
IWOCA'11 Proceedings of the 22nd international conference on Combinatorial Algorithms
Parameterized longest previous factor
Theoretical Computer Science
Variations of the parameterized longest previous factor
Journal of Discrete Algorithms
p-Suffix sorting as arithmetic coding
Journal of Discrete Algorithms
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Journal of Discrete Algorithms
Hi-index | 0.01 |
In molecular biology, it is said that two biological sequences tend to have similar properties if they have similar three-dimensional structures. Hence, it is very important to find not only similar sequences in the string sense, but also structurally similar sequences from databases. In this paper we propose a new data structure that is a generalization of a parameterized suffix tree (p-suffix tree for short) introduced by Baker. We call it the structural suffix tree or s-suffix tree for short. The s-suffix tree can be used for finding structurally related patterns of RNA or single-stranded DNA. Furthermore, we propose an O(n(log|Σ| + log|Π|)) on-line algorithm for constructing it, where n is the sequence length, |Σ| is the size of the normal alphabet, and |Π| is that of the alphabet called “parameter,” which is related to the structure of the sequence. Our algorithm achieves linear time when it is used to analyze RNA and DNA sequences. Furthermore, as an algorithm for constructing the p-suffix tree, it is the first on-line algorithm, though the computing bound of our algorithm is the same as that of Kosaraju’s best-known algorithm. The results of computational experiments using actual RNA and DNA sequences are also given to demonstrate our algorithm’s practicality.