A locally adaptive data compression scheme
Communications of the ACM
An exact characterization of greedy structures
SIAM Journal on Discrete Mathematics
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Large Margin Methods for Structured and Interdependent Output Variables
The Journal of Machine Learning Research
A hint to search for metalloproteins in gene banks
Bioinformatics
Learning as search optimization: approximate large margin methods for structured prediction
ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning structured prediction models: a large margin approach
ICML '05 Proceedings of the 22nd international conference on Machine learning
Fast Kernel Classifiers with Online and Active Learning
The Journal of Machine Learning Research
Predicting Structured Data (Neural Information Processing)
Predicting Structured Data (Neural Information Processing)
Prediction of zinc-binding sites in proteins from sequence
Bioinformatics
Bioinformatics
Metal binding in proteins: machine learning complements x-ray absorption spectroscopy
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Hi-index | 0.00 |
Prediction of binding sites from sequence can significantly help toward determining the function of uncharacterized proteins on a genomic scale. The task is highly challenging due to the enormous amount of alternative candidate configurations. Previous research has only considered this prediction problem starting from 3D information. When starting from sequence alone, only methods that predict the bonding state of selected residues are available. The sole exception consists of pattern-based approaches, which rely on very specific motifs and cannot be applied to discover truly novel sites. We develop new algorithmic ideas based on structured-output learning for determining transition-metal-binding sites coordinated by cysteines and histidines. The inference step (retrieving the best scoring output) is intractable for general output types (i.e., general graphs). However, under the assumption that no residue can coordinate more than one metal ion, we prove that metal binding has the algebraic structure of a matroid, allowing us to employ a very efficient greedy algorithm. We test our predictor in a highly stringent setting where the training set consists of protein chains belonging to SCOP folds different from the ones used for accuracy estimation. In this setting, our predictor achieves 56 percent precision and 60 percent recall in the identification of ligand-ion bonds.