Comparison of reuse strategies for case-based classification in bioinformatics

Authors:
Isabelle Bichindaritz
Affiliations:
Institute of Technology, University of Washington Tacoma, Tacoma, Washington
Venue:
ICCBR'11 Proceedings of the 19th international conference on Case-Based Reasoning Research and Development
Year:
2011

Citing 10
Cited 0

Instance-Based Learning Algorithms

Machine Learning
Applications of case-based reasoning in molecular biology

AI Magazine
Bioinformatics—an introduction for computer scientists

ACM Computing Surveys (CSUR)
Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data

Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)

Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
RSCTC'2010 discovery challenge: mining DNA microarray data for medical diagnosis and treatment

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Methods in case-based classification in bioinformatics: lessons learned

ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
Case based reasoning with bayesian model averaging: an improved method for survival analysis on microarray data

ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bioinformatics offers an interesting challenge for data mining algorithms given the high dimensionality of its data and the comparatively small set of samples. Case-based classification algorithms have been successfully applied to classify bioinformatics data and often serve as a reference for other algorithms. Therefore this paper proposes to study, on some of the most benchmarked datasets in bioinformatics, the performance of different reuse strategies in case-based classification in order to make methodological recommendations for applying these algorithms to this domain. In conclusion, k-nearest-neighbor (kNN) classifiers coupled with between-group to within-group sum of squares (BSS/WSS) feature selection can perform as well and even better than the best benchmarked algorithms to date. However the reuse strategy chosen played a major role to optimize the algorithms. In particular, the optimization of both the number k of neighbors and the number of features accounted was key to improving classification accuracy.