Methods in case-based classification in bioinformatics: lessons learned

  • Authors:
  • Isabelle Bichindaritz

  • Affiliations:
  • University of Washington Tacoma, Institute of Technology, Tacoma, Washington

  • Venue:
  • ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Bioinformatics datasets are often used to compare classification algorithms for highly dimensional data. Since genetic data are becoming more and more routinely used in medical settings, researchers and life scientists alike are interested in answering such questions as finding the gene signature of a disease, classifying data for diagnosis, or evaluating the severity of a disease. Since many different types of algorithms have been applied to this domain, often with comparable, although slightly different, results, it may be cumbersome to determine which one to use and how to make this determination. Therefore this paper proposes to study, on some of the most benchmarked datasets in bioinformatics, the performance of K-nearest-neighbor and related case-based classification algorithms in order to make methodological recommendations for applying these algorithms to this domain. In conclusion, K-nearest-neighbor classifiers perform as or among the best in combination with feature selection methods.