Identification of distinguishing motifs

Authors:
WangSen Feng;Zhanyong Wang;Lusheng Wang
Affiliations:
Department of Computer Science, Peking University, People's Republic of China;Department of Computer Science, City University of Hong Kong, Hong Kong;Department of Computer Science, City University of Hong Kong, Hong Kong
Venue:
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Year:
2007

Citing 8
Cited 1

Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization

Machine Learning - Special issue on applications in molecular biology
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Finding similar regions in many strings

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Distinguishing string selection problems

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
On the closest string and substring problems

Journal of the ACM (JACM)
Finding similar regions in many sequences

Journal of Computer and System Sciences - STOC 1999
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Randomized algorithms for motif detection

ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation

Detecting Motifs in a Large Data Set: Applying Probabilistic Insights to Motif Finding

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Motivation: Motif identification for sequences has many important applications in biological studies, e.g., diagnostic probe design, locating binding sites and regulatory signals, and potential drug target identification. There are two versions. 1. Single Group: Given a group of n sequences, find a length-l motif that appears in each of the given sequences and those occurrences of the motif are similar. 2. Two Groups: Given two groups of sequences B and G, find a length-l (distinguishing) motif that appears in every sequence in B and does not appear in anywhere of the sequences in G. Here the occurrences of the motif in the given sequences have errors. Currently, most of existing programs can only handle the case of single group. Moreover, it is very difficult to use edit distance (allowing indels and replacements) for motif detection. Results: (1) We propose a randomized algorithm for the one group problem that can handle indels in the occurrences of the motif. (2) We give an algorithm for the two groups problem. (3) Extensive simulations have been done to evaluate the algorithms.