Protein classification using transductive learning on phylogenetic profiles

  • Authors:
  • Roger Craig;Li Liao

  • Affiliations:
  • University of Delaware, Newark, DE;University of Delaware, Newark, DE

  • Venue:
  • Proceedings of the 2006 ACM symposium on Applied computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Phylogenetic profiles of proteins - strings of ones and zeros encoding respectively the presence and absence of proteins in a group of genomes - have recently been used to identify homologous proteins and/or proteins that are functionally linked, such as participating in a metabolic pathway. We proposed a novel learning method for protein classification based on phylogenetic profiles, which takes into account both the phylogenetic tree structure and the likelihood of proteins presence in genomes. The method consists of a mechanism to extend the profiles with extra bits encoding the phylogenetic tree, whose interior nodes, representing hypothetical ancestral genomes, are scored in a way to reflect their chances of developing divergence in the descendants. The scoring scheme also incorporates the likelihood of proteins presence in genomes as weighting factors, which are collected from the training data initially and integrated as part of kernel of a support vector machine. In a transductive learning scheme, when the SVM is used for classifying test data, the weighting factors are updated iteratively using the predicted results. We tested our method on the proteome of Saccharomyces cerevisiae and used the MIPS classification as a benchmark. The results showed that the classification accuracy was greatly increased.