Regular expression constrained sequence alignment

Authors:
Abdullah N. Arslan
Affiliations:
Department of Computer Science, The University of Vermont, Burlington, VT 05405, USA
Venue:
Journal of Discrete Algorithms
Year:
2007

Citing 7
Cited 4

Algorithms for the Longest Common Subsequence Problem

Journal of the ACM (JACM)
Constrained Multiple Sequence Alignment Tool Development and Its Application to RNase Family Alignment

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Efficient Constrained Multiple Sequence Alignment with Performance Guarantee

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
The constrained longest common subsequence problem

Information Processing Letters
A simple algorithm for the constrained sequence problems

Information Processing Letters
MuSiC: a tool for multiple sequence alignment with constraints

Bioinformatics
Introduction to Automata Theory, Languages, and Computation (3rd Edition)

Introduction to Automata Theory, Languages, and Computation (3rd Edition)

Regular language constrained sequence alignment revisited

IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
SA-REPC: sequence alignment with regular expression path constraint

LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Fast algorithms for computing the constrained LCS of run-length encoded strings

Theoretical Computer Science
Algorithms for path-constrained sequence alignment

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce regular expression constrained sequence alignment as the problem of finding the maximum alignment score between given strings S"1 and S"2 over all alignments such that in these alignments there exists a segment where some substring s"1 of S"1 is aligned to some substring s"2 of S"2, and both s"1 and s"2 match a given regular expression R, i.e. s"1,s"2@?L(R) where L(R) is the regular language described by R. For complexity results we assume, without loss of generality, that n=|S"1|=|m|=|S"2|. A motivation for the problem is that protein sequences can be aligned in a way that known motifs guide the alignments. We present an O(nmr) time algorithm for the regular expression constrained sequence alignment problem where r=O(t^4), and t is the number of states of a nondeterministic finite automaton N that accepts L(R). We use in our algorithm a nondeterministic weighted finite automaton M that we construct from N. M has O(t^2) states where the transition-weights are obtained from the given costs of edit operations, and state-weights correspond to optimum alignment scores we compute using the underlying dynamic programming solution for sequence alignment. If we are given a deterministic finite automaton D accepting L(R) with t"d states then our construction creates a deterministic finite automaton M"d with t"d^2 states. In this case, our algorithm takes O(t"d^2nm) time. Using M"d results in faster computation than using M when t"d