Generalization of a Suffix Tree for RNA Structural Pattern Matching

  • Authors:
  • Tetsuo Shibuya

  • Affiliations:
  • -

  • Venue:
  • SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In molecular biology, it is said that two biological sequences tend to have similar properties if they have similar 3-D structures. Hence, it is very important to find not only similar sequences in the string sense, but also structurally similar sequences from databases. In this paper, we propose a new data structure that is a generalization of a parameterized suffix tree (p-suffix tree for short) introduced by Baker. This data structure can be used for finding structurally related patterns of RNA or single-stranded DNA. Furthermore, we propose an O(n(log |Σ|+log |Π|)) on-line algorithm for constructing it, where n is the sequence length, |Σ| is the size of the normal alphabet, and |Π| is that of the alphabet called "parameter," which is related to the structure of the sequence. Our algorithm achieves a linear time when it is used to analyze RNA and DNA sequences. Furthermore, as an algorithm for constructing the p-suffix tree, it is the first on-line algorithm, though the computing bound of our algorithm is same as that of Kosaraju's best-known algorithm. The results of computational experiments using actual RNA and DNA sequences are also given to demonstrate our algorithm's practicality.