On selecting a satisfying truth assignment (extended abstract)
SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
On the greedy algorithm for satisfiability
Information Processing Letters
Randomized algorithms
Finding similar regions in many sequences
Journal of Computer and System Sciences - STOC 1999
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Spelling Approximate Repeated or Common Motifs Using a Suffix Tree
LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics
A Probabilistic Algorithm for k-SAT and Constraint Satisfaction Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
On the complexity of finding common approximate substrings
Theoretical Computer Science
The phase transition in inhomogeneous random graphs
Random Structures & Algorithms
Fast and Practical Algorithms for Planted (l, d) Motif Search
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Exploiting a theory of phase transitions in three-satisfiability problems
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
A graph clustering approach to weak motif recognition
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Identification of distinguishing motifs
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Why large CLOSEST STRING instances are easy to solve in practice
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
On the hardness of counting and sampling center strings
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
On the Hardness of Counting and Sampling Center Strings
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
We give a probabilistic algorithm for Consensus Sequence , a NP-complete subproblem of motif recognition, that can be described as follows: given set of l -length sequences, determine if there exists a sequence that has Hamming distance at most d from every sequence. We demonstrate that distance between a randomly selected majority sequence and a consensus sequence decreases as the size of the data set increases. Applying our probabilistic paradigms and insights to motif recognition we develop pMCL-WMR, a program capable of detecting motifs in large synthetic and real-genomic data sets. Our results show that detecting motifs in data sets increases in ease and efficiency when the size of set of sequence increases, a surprising and counter-intuitive fact that has significant impact on this deeply-investigated area.