Approximate matching of secondary structures

Authors:
Nadia El-Mabrouk;Mathieu Raffinot
Affiliations:
Université de Montréal, Montréal, Québec, Canada;CNRS, Equipe Génome et Informatique, Evry, France
Venue:
Proceedings of the sixth annual international conference on Computational biology
Year:
2002

Citing 3
Cited 5

Programming Techniques: Regular expression search algorithm

Communications of the ACM
Flexible Identification of Structural Objects in Nucleic Acid Sequences: Palindromes, Mirror Repeats, Pseudoknots and Triple Helices

CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
Reporting Exact and Approximate Regular Expression Matches

CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching

Pattern Matching for Arc-Annotated Sequences

FST TCS '02 Proceedings of the 22nd Conference Kanpur on Foundations of Software Technology and Theoretical Computer Science
Exact pattern matching for RNA secondary structures

APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Exact matching of RNA secondary structure patterns

Theoretical Computer Science - Pattern discovery in the post genome
Pattern matching for arc-annotated sequences

ACM Transactions on Algorithms (TALG)
Faster pattern matching algorithm for arc-annotated sequences

Proceedings of the 2005 international conference on Federation over the Web

Quantified Score

Hi-index	0.01

Visualization

Abstract

Several methods have been developed for identifying more or less complex RNA structures in a genome. Whatever the method is, it is always based on the search of conserved primary and secondary structures. While various efficient methods have been developed for searching motifs of the primary structure, usually represented as regular expressions, few effort has been expended in the efficient search of secondary structure signals. By a helix, we mean a structure defined by a combination of sequence and folding constraints. We present a flexible algorithm that searches for all approximate matches of a helix in a genome. Helices are represented by special regular expressions, that we call secondary expressions. The method is based on an alignment graph constructed from several copies of a pushdown automaton, arranged one on top of another. The worst time complexity is O(rpn), where n is the size of the genome, p the size of the secondary expression, and r its number of union symbols. We present our results of searching for specific signals of the tRNA and RNase P RNA in two genomes.