Protein classification using transductive learning on phylogenetic profiles

Authors:
Roger Craig;Li Liao
Affiliations:
University of Delaware, Newark, DE;University of Delaware, Newark, DE
Venue:
Proceedings of the 2006 ACM symposium on Applied computing
Year:
2006

Citing 7
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Making large-scale support vector machine learning practical

Advances in kernel methods
Gene functional classification from heterogeneous data

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Using the Fisher Kernel Method to Detect Remote Protein Homologies

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Mismatch string kernels for discriminative protein classification

Bioinformatics
Iterative Weighting of Phylogenetic Profiles Increases Classification Accuracy

ICMLA '05 Proceedings of the Fourth International Conference on Machine Learning and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Phylogenetic profiles of proteins - strings of ones and zeros encoding respectively the presence and absence of proteins in a group of genomes - have recently been used to identify homologous proteins and/or proteins that are functionally linked, such as participating in a metabolic pathway. We proposed a novel learning method for protein classification based on phylogenetic profiles, which takes into account both the phylogenetic tree structure and the likelihood of proteins presence in genomes. The method consists of a mechanism to extend the profiles with extra bits encoding the phylogenetic tree, whose interior nodes, representing hypothetical ancestral genomes, are scored in a way to reflect their chances of developing divergence in the descendants. The scoring scheme also incorporates the likelihood of proteins presence in genomes as weighting factors, which are collected from the training data initially and integrated as part of kernel of a support vector machine. In a transductive learning scheme, when the SVM is used for classifying test data, the weighting factors are updated iteratively using the predicted results. We tested our method on the proteome of Saccharomyces cerevisiae and used the MIPS classification as a benchmark. The results showed that the classification accuracy was greatly increased.