Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization
Machine Learning - Special issue on applications in molecular biology
Finding motifs using random projections
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Finding motifs in the twilight zone
Proceedings of the sixth annual international conference on Computational biology
From promoter sequence to expression: a probabilistic framework
Proceedings of the sixth annual international conference on Computational biology
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Bases of Motifs for Generating Repeated Patterns with Wild Cards
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Masking patterns in sequences: A new class of motif discovery with don't cares
Theoretical Computer Science
Improving motif refinement using hybrid expectation maximization and random projection
ISB '10 Proceedings of the International Symposium on Biocomputing
Improved pattern-driven algorithms for motif finding in DNA sequences
RECOMB'05 Proceedings of the 2005 joint annual satellite conference on Systems biology and regulatory genomics
Motif yggdrasil: sampling from a tree mixture model
RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Hi-index | 0.00 |
An important part of deciphering gene regulatory mechanisms is discovering transcription factor binding sites. In many cases, these sites can be detected because they are often overrepresented in genomic sequences. The detection of the overrepresented signals in sequences, or motif-finding has become a central problem in computational biology. There are two major computational frameworks for attacking the motif finding problem which differ in their representation of the signals. The most popular is the profile or PSSM (Position Specific Scoring Matrix) representation. The goal of these algorithms is to obtain probabilistic representations of the overrepresented signals. Another is the consensus pattern or pattern with mismatches representation which represents a signal as discrete consensus pattern and allows some mismatches to occur in each instance of the pattern. The advantage of profiles is the expressiveness of their representation while the advantage of the consensus pattern approach is the existence of efficient algorithms that guarantee discovery of the best patterns. In this paper we present a unified framework for motif finding which encompasses both the profile representation and the consensus pattern representation. We prove that the problem of discovering the best profiles can be solved by considering a degenerate version of the problem of finding the best consensus patterns. The main advantage of our framework is that it motivates a novel algorithm, MITRA-PSSM, which discovers profiles, yet provides some of the guarantees of discovering the best signals. The algorithm searches for best profiles with respect to information content which is the same criterion of popular algorithms such as MEME and CONSENSUS. MITRA-PSSM is specifically designed for searching for profiles in this framework and introduces a novel notion of scoring consensus patterns, discrete information content. MITRA-PSSM is available for public use via webserver at http://www.calit2.net/compbio/mitra/.