Prediction of protein subcellular locations using fuzzy k-NN method

  • Authors:
  • Ying Huang;Yanda Li

  • Affiliations:
  • State Key Laboratory of Intelligent Technology and Systems, Department of Automation, Institute of Bioinformatics, Tsinghua University, Beijing 100084, People's Republic of China;State Key Laboratory of Intelligent Technology and Systems, Department of Automation, Institute of Bioinformatics, Tsinghua University, Beijing 100084, People's Republic of China

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Protein localization data are a valuable information resource helpful in elucidating protein functions. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. Results: In this paper, fuzzy k-nearest neighbors (k-NN) algorithm has been introduced to predict proteins' subcellular locations from their dipeptide composition. The prediction is performed with a new data set derived from version 41.0 SWISS-PROT databank, the overall predictive accuracy about 80% has been achieved in a jackknife test. The result demonstrates the applicability of this relative simple method and possible improvement of prediction accuracy for the protein subcellular locations. We also applied this method to annotate six entirely sequenced proteomes, namely Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Oryza sativa, Arabidopsis thaliana and a subset of all human proteins. Availability: Supplementary information and subcellular location annotations for eukaryotes are available at http://166.111.30.65/hying/fuzzy_loc.htm