DNA Motif Representation with Nucleotide Dependency
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
ML-Consensus: a general consensus model for variable-length transcription factor binding sites
EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Geometric visualization of TF binding sites in context
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Conservation patterns in cis-elements reveal compensatory mutations
RCG'06 Proceedings of the RECOMB 2006 international conference on Comparative Genomics
Motif yggdrasil: sampling from a tree mixture model
RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Hi-index | 3.84 |
Motivation: Positional weight matrix (PWM) is derived from a set of experimentally determined binding sites. Here we explore whether there exist subclasses of binding sites and if the mixture of these subclass-PWMs can improve the binding site prediction. Intuitively, the subclasses correspond to either distinct binding preference of the same transcription factor in different contexts or distinct subtypes of the transcription factor. Result: We report an Expectation Maximization algorithm adapting the mixture model of Baily and Elkan. We assessed the relative merit of using two subclass-PWMs. The resulting PWMs were evaluated with respect to preferred conservation (relative to mouse) of potential sites in human promoters and expression coherence of the potential target genes. Based on 64 JASPAR vertebrate PWMs, 61--81% of the cases resulted in a higher conservation using the mixture model. Also in 98% of the cases the expression coherence was higher for the target genes of one of the subclass-PWMs. Our analysis of Reb1 sites is consistent with previously discovered subtypes using independent methods. Additionally application of our method to mutated sites for transcription factor LEU3 reveals subclasses that segregate into strongly binding and weakly binding sites with P-value of 0.008. This is the first study which attempts to quantify the subtly different binding specificities of a transcription factor on a large scale and suggests the use of a mixture of PWMs, instead of the current practice of using a single PWM, for a transcription factor. Availability: Contact: sridharh@pcbi.upenn.edu