Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling dependencies in protein-DNA binding sites
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Grafting: fast, incremental feature selection by gradient descent in function space
The Journal of Machine Learning Research
Feature selection, L1 vs. L2 regularization, and rotational invariance
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Efficient learning of Bayesian network classifiers: an extension to the TAN classifier
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Hi-index | 0.00 |
Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. In many cases this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF-DNA interactions, based on Markov networks. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our models, and devise an algorithm for learning their structural features from binding site data. We evaluate our approach on synthetic data, and then apply it to binding site and ChIP-chip data from yeast. We reveal sequence features that are present in the binding specificities of yeast TFs, and show that FMMs explain the binding data significantly better than PSSMs.