Instance-Based Learning Algorithms
Machine Learning
Bioinformatics—an introduction for computer scientists
ACM Computing Surveys (CSUR)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
RSCTC'2010 discovery challenge: mining DNA microarray data for medical diagnosis and treatment
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Methods in case-based classification in bioinformatics: lessons learned
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
Hi-index | 0.00 |
Bioinformatics offers an interesting challenge for data mining algorithms given the high dimensionality of its data and the comparatively small set of samples. Case-based classification algorithms have been successfully applied to classify bioinformatics data and often serve as a reference for other algorithms. Therefore this paper proposes to study, on some of the most benchmarked datasets in bioinformatics, the performance of different reuse strategies in case-based classification in order to make methodological recommendations for applying these algorithms to this domain. In conclusion, k-nearest-neighbor (kNN) classifiers coupled with between-group to within-group sum of squares (BSS/WSS) feature selection can perform as well and even better than the best benchmarked algorithms to date. However the reuse strategy chosen played a major role to optimize the algorithms. In particular, the optimization of both the number k of neighbors and the number of features accounted was key to improving classification accuracy.