A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression
The Journal of Machine Learning Research
Properties and identification of human protein drug targets
Bioinformatics
Hi-index | 0.00 |
In silico identification of potential drug targets is a crucial task for drug discovery. Traditional approaches utilize only protein sequence or structural information to predict drug targets, and achieve limited successes. Since cellular proteins function in the context of interaction networks by interacting with other cellular macromolecules, analysis of topological features of proteins in such networks reveal important insights on the potential druggability of proteins. In this paper, we first introduced ten novel topological features extracted from the human protein-protein interaction network. When designing these new features, we specially emphasized the roles of three disease-related groups of proteins: known drug targets, disease genes, and essential genes. Based on these novel network features, we built highly accurate models with up to 80% classification accuracy using support vector machines, L1-regularized logistic regression, and k-nearest neighbors to predict drug target, and analyzed the relevance of each feature to the proteins' druggability. Moreover, we combined our network features with a set of protein sequence features, and achieved more robust experimental performance. With the framework of integrating both network and sequence features, our method can also be used to prioritize multiple candidate proteins according to their predicted druggability.