Efficient algorithms for regular expression constrained sequence alignment

Authors:
Yun-Sheng Chung;Chin Lung Lu;Chuan Yi Tang
Affiliations:
Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 300, ROC;Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan 300, ROC;Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 300, ROC
Venue:
Information Processing Letters
Year:
2007

Citing 8
Cited 2

A linear space algorithm for computing maximal common subsequences

Communications of the ACM
Current Topics in Computational Molecular Biology

Current Topics in Computational Molecular Biology
The constrained longest common subsequence problem

Information Processing Letters
A simple algorithm for the constrained sequence problems

Information Processing Letters
MuSiC: a tool for multiple sequence alignment with constraints

Bioinformatics
A memory-efficient algorithm for multiple sequence alignment with constraints

Bioinformatics
Introduction to Automata Theory, Languages, and Computation (3rd Edition)

Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Regular expression constrained sequence alignment

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching

Regular language constrained sequence alignment revisited

IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Efficient parallel algorithm for multiple sequence alignments with regular expression constraints on graphics processing units

International Journal of Computational Science and Engineering

Quantified Score

Hi-index	0.89

Visualization

Abstract

Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which, if implemented naively, can take time and space up to O(|@S|^2|V|^4n^2) and O(|@S|^2|V|^4n), respectively, where @S is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(|V|^3n^2) time and O(|V|^2n) space in the worst case. If |V|=O(logn) we propose another algorithm with time complexity O(|V|^2log|V|n^2). The time complexity of our algorithms is independent of @S, which is desirable in protein applications where the formulation of this problem originates; a factor of |@S|^2=400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.