A hybrid method based on artificial immune system and k-NN algorithm for better prediction of protein cellular localization sites

  • Authors:
  • Ibrahim Turkoglu;E. Didem Kaymaz

  • Affiliations:
  • Firat University, Technical Education Faculty, Department of Electronics and Computer Science, 23119 Elazig, Turkiye;Firat University, Technical Education Faculty, Department of Electronics and Computer Science, 23119 Elazig, Turkiye

  • Venue:
  • Applied Soft Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of machine learning tools in biological data analysis is increasing gradually. This is mainly because the effectiveness of classification and recognition systems has improved in a great deal to help medical experts in diagnosing. In this paper, we investigate the performance of an artificial immune system based k-nearest neighbors algorithm with and without cross-validation in a class of imbalanced problems from bioinformatics field. Furthermore, we used an unsupervised artificial immune system algorithm for reduction training data dimension and k-nearest neighbors algorithm for classification purpose. The conducted experiments showed the effectiveness of the proposed schema. By selecting the E. coli database, we could compare our classification accuracy with other methods which were presented in the literature. The proposed hybrid system produced much more accurate results than the Horton and Nakai's proposal [P. Horton, K. Nakai, A probabilistic classification system for predicting the cellular localization sites of proteins, in: Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology, AAAI Press, St. Louis, 1996, pp. 109-115; P. Horton, K. Nakai, Better prediction of protein cellular localization sites with the k-nearest neighbors classifier, in: Proceedings of Intelligent Systems in Molecular Biology, Halkidiki, Greece, 1997, pp. 368-383]. Besides the accuracy improvement, one of the important aspects of the proposed methodology is the complexity. As the artificial immune system provided data reduction, the training complexity of the proposed system is considerably low against the k-nearest neighbors classifier.