New bounds for motif finding in strong instances

Authors:
Broňa Brejová;Daniel G. Brown;Ian M. Harrower;Tomáš Vinař
Affiliations:
David R. Cheriton School of Computer Science, University of Waterloo;David R. Cheriton School of Computer Science, University of Waterloo;David R. Cheriton School of Computer Science, University of Waterloo;David R. Cheriton School of Computer Science, University of Waterloo
Venue:
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Year:
2006

Citing 4
Cited 2

Randomized Distributed Edge Coloring via an Extension of the Chernoff--Hoeffding Bounds

SIAM Journal on Computing
Finding similar regions in many sequences

Journal of Computer and System Sciences - STOC 1999
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Sharper upper and lower bounds for an approximation scheme for consensus-pattern

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching

On the Structure of Small Motif Recognition Instances

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Why large CLOSEST STRING instances are easy to solve in practice

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many algorithms for motif finding that are commonly used in bioinformatics start by sampling r potential motif occurrences from n input sequences. The motif is derived from these samples and evaluated on all sequences. This approach works extremely well in practice, and is implemented by several programs. Li, Ma and Wang have shown that a simple algorithm of this sort is a polynomial-time approximation scheme. However, in 2005, we showed specific instances of the motif finding problem for which the approximation ratio of a slight variation of this scheme converges to one very slowly as a function of the sample size r, which seemingly contradicts the high performance of sample-based algorithms. Here, we account for the difference by showing that, for a variety of different definitions of “strong” binary motifs, the approximation ratio of sample-based algorithms converges to one exponentially fast in r. We also describe “very strong” motifs, for which the simple sample-based approach always identifies the correct motif, even for modest values of r.