Methods in case-based classification in bioinformatics: lessons learned

Authors:
Isabelle Bichindaritz
Affiliations:
University of Washington Tacoma, Institute of Technology, Tacoma, Washington
Venue:
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
Year:
2011

Citing 8
Cited 2

Applications of case-based reasoning in molecular biology

AI Magazine
Bioinformatics—an introduction for computer scientists

ACM Computing Surveys (CSUR)
Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data

Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)

Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Applied Survival Analysis: Regression Modeling of Time to Event Data

Applied Survival Analysis: Regression Modeling of Time to Event Data
RSCTC'2010 discovery challenge: mining DNA microarray data for medical diagnosis and treatment

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing

Comparison of reuse strategies for case-based classification in bioinformatics

ICCBR'11 Proceedings of the 19th international conference on Case-Based Reasoning Research and Development
Application of classification algorithms on IDDM rat data

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bioinformatics datasets are often used to compare classification algorithms for highly dimensional data. Since genetic data are becoming more and more routinely used in medical settings, researchers and life scientists alike are interested in answering such questions as finding the gene signature of a disease, classifying data for diagnosis, or evaluating the severity of a disease. Since many different types of algorithms have been applied to this domain, often with comparable, although slightly different, results, it may be cumbersome to determine which one to use and how to make this determination. Therefore this paper proposes to study, on some of the most benchmarked datasets in bioinformatics, the performance of K-nearest-neighbor and related case-based classification algorithms in order to make methodological recommendations for applying these algorithms to this domain. In conclusion, K-nearest-neighbor classifiers perform as or among the best in combination with feature selection methods.