Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization
Machine Learning - Special issue on applications in molecular biology
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A Statistical Method for Finding Transcription Factor Binding Sites
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
A discriminative model for identifying spatial cis-regulatory modules
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
DNA Motif Representation with Nucleotide Dependency
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Nucleosome occupancy information improves de novo motif discovery
RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
Hi-index | 0.00 |
This paper takes a new view of motif discovery, addressing a common problem in existing motif finders. A motif is treated as a feature of the input promoter regions that leads to a good classifier between these promoters and a set of background promoters. This perspective allows us to adapt existing methods of feature selection, a well studied topic in machine learning, to motif discovery. We develop a general algorithmic framework that can be specialized to work with a wide variety of motif models, including consensus models with degenerate symbols or mismatches, and composite motifs. A key feature of our algorithm is that it measures over-representation while maintaining information about the distribution of motif instances in individual promoters. The assessment of a motif's discriminative power is normalized against chance behaviour by a probabilistic analysis. We apply our framework to two popular motif models, and are able to detect several known binding sites in sets of co-regulated genes in yeast.