A new hybrid approach to predict subcellular localization by incorporating protein evolutionary conservation information

  • Authors:
  • ShaoWu Zhang;YunLong Zhang;JunHui Li;HuiFeng Yang;YongMei Cheng;GuoPing Zhou

  • Affiliations:
  • College of Automation, Northwestern Polytechnical University, Xi'an, China;Department of Computer, First Aeronautical Institute of Air Force, Henan, China;College of Automation, Northwestern Polytechnical University, Xi'an, China;College of Automation, Northwestern Polytechnical University, Xi'an, China;College of Automation, Northwestern Polytechnical University, Xi'an, China;Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts

  • Venue:
  • LSMS'07 Proceedings of the 2007 international conference on Life System Modeling and Simulation
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapidly increasing number of sequence entering into the genome databank has created the need for fully automated methods to analyze them. Knowing the cellular location of a protein is a key step towards understanding its function. The development in statistical prediction of protein attributes generally consists of two cores: one is to construct a training dataset and the other is to formulate a predictive algorithm. The latter can be further separated into two subcores: one is how to give a mathematical expression to effectively represent a protein and the other is how to find a powerful algorithm to accurately perform the prediction. Here, an improved evolutionary conservation algorithm was proposed to calculate per residue conservation score. Then, each protein can be represented as a feature vector created with multi-scale energy (MSE). In addition, the protein can be represented as other feature vectors based on amino acid composition (AAC), weighted auto-correlation function and Moment descriptor methods. Finally, a novel hybrid approach was developed by fusing the four kinds of feature classifiers through a product rule system to predict 12 subcellular locations. Compared with existing methods, this new approach provides better predictive performance. High success accuracies were obtained in both jackknife cross-validation test and independent dataset test, suggesting that introducing protein evolutionary information and the concept of fusing multifeatures classifiers are quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.