Research Article: Kernel-based data fusion improves the drug-protein interaction prediction

  • Authors:
  • Yong-Cui Wang;Chun-Hua Zhang;Nai-Yang Deng;Yong Wang

  • Affiliations:
  • Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Science, Xining 810001, China;Information School, Renmin University of China, Beijing 100872, China;College of Science, China Agricultural University, Beijing 100083, China;National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, CAS, Beijing 100190, China

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Proteins are involved in almost every action of every organism by interacting with other small molecules including drugs. Computationally predicting the drug-protein interactions is particularly important in speeding up the process of developing novel drugs. To borrow the information from existing drug-protein interactions, we need to define the similarity among proteins and the similarity among drugs. Usually these similarities are defined based on one single data source and many methods have been proposed. However, the availability of many genomic and chemogenomic data sources allows us to integrate these useful data sources to improve the predictions. Thus a great challenge is how to integrate these heterogeneous data sources. Here, we propose a kernel-based method to predict drug-protein interactions by integrating multiple types of data. Specially, we collect drug pharmacological and therapeutic effects, drug chemical structures, and protein genomic information to characterize the drug-target interactions, then integrate them by a kernel function within a support vector machine (SVM)-based predictor. With this data fusion technology, we establish the drug-protein interactions from a collections of data sources. Our new method is validated on four classes of drug target proteins, including enzymes, ion channels (ICs), G-protein couple receptors (GPCRs), and nuclear receptors (NRs). We find that every single data source is predictive and integration of different data sources allows the improvement of accuracy, i.e., data integration can uncover more experimentally observed drug-target interactions upon the same levels of false positive rate than single data source based methods. The functional annotation analysis indicates that our new predictions are worthy of future experimental validation. In conclusion, our new method can efficiently integrate diverse data sources, and will promote the further research in drug discovery.