Protein Function Prediction using Multi-label Ensemble Classification

  • Authors:
  • Guoxian Yu;Huzefa Rangwala;Carlotta Domeniconi;Guoji Zhang;Zhiwen Yu

  • Affiliations:
  • Southwest University, Beibei and South China University of Technology, Guangzhou;George Mason University, Fairfax;George Mason University, Fairfax;South China University of Technology, Guangzhou;South China University of Technology, Guangzhou

  • Venue:
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

High-throughput experimental techniques produce several heterogeneous proteomic and genomic datasets. To computationally annotate proteins, it is necessary and promising to integrate these heterogeneous data sources. Some methods transform these data sources into different kernels or feature representations. Next, these kernels are linearly (or non-linearly) combined into a composite kernel. The composite kernel is utilized to develop a predictive model to infer the function of proteins. A protein can have multiple roles and functions (or labels). Therefore, multi-label learning methods are also adapted for protein function prediction. We develop a transductive multi-label classifier (TMC) to predict multiple functions of proteins using several unlabeled proteins. We also propose a method called transductive multi-label ensemble classifier (TMEC) for integrating the different data sources using an ensemble approach. TMEC trains a graph-based multi-label classifier on each single data source and then combines the predictions of the individual classifiers. We use a directed bi-relational graph to captures three types of relationships between pairs of proteins, between pairs of functions, and between proteins and functions. We evaluate the effectiveness of TMC and TMEC to predict the functions of proteins on three benchmarks. We show that our approaches perform better than recently proposed protein function prediction methods on composite and multiple kernels.