Discriminative motifs

Authors:
Saurabh Sinha
Affiliations:
University of Washington, Seattle, WA
Venue:
Proceedings of the sixth annual international conference on Computational biology
Year:
2002

Citing 4
Cited 3

Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization

Machine Learning - Special issue on applications in molecular biology
Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
An Exact Method for Finding Short Motifs in Sequences, with Application to the Ribosome Binding Site Problem

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A Statistical Method for Finding Transcription Factor Binding Sites

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology

A discriminative model for identifying spatial cis-regulatory modules

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
DNA Motif Representation with Nucleotide Dependency

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Nucleosome occupancy information improves de novo motif discovery

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper takes a new view of motif discovery, addressing a common problem in existing motif finders. A motif is treated as a feature of the input promoter regions that leads to a good classifier between these promoters and a set of background promoters. This perspective allows us to adapt existing methods of feature selection, a well studied topic in machine learning, to motif discovery. We develop a general algorithmic framework that can be specialized to work with a wide variety of motif models, including consensus models with degenerate symbols or mismatches, and composite motifs. A key feature of our algorithm is that it measures over-representation while maintaining information about the distribution of motif instances in individual promoters. The assessment of a motif's discriminative power is normalized against chance behaviour by a probabilistic analysis. We apply our framework to two popular motif models, and are able to detect several known binding sites in sets of co-regulated genes in yeast.