Guidelines to Select Machine Learning Scheme for Classification of Biomedical Datasets

Authors:
Ajay Kumar Tanwani;Jamal Afridi;M. Zubair Shafiq;Muddassar Farooq
Affiliations:
Next Generation Intelligent Networks Research Center (nexGIN RC), National University of Computer & Emerging Sciences (FAST-NU), Islamabad, Pakistan;Next Generation Intelligent Networks Research Center (nexGIN RC), National University of Computer & Emerging Sciences (FAST-NU), Islamabad, Pakistan;Next Generation Intelligent Networks Research Center (nexGIN RC), National University of Computer & Emerging Sciences (FAST-NU), Islamabad, Pakistan;Next Generation Intelligent Networks Research Center (nexGIN RC), National University of Computer & Emerging Sciences (FAST-NU), Islamabad, Pakistan
Venue:
EvoBIO '09 Proceedings of the 7th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Year:
2009

Citing 11
Cited 5

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
An introduction to variable and feature selection

The Journal of Machine Learning Research
Constructive meta-learning with machine learning method repositories

IEA/AIE'2004 Proceedings of the 17th international conference on Innovations in applied artificial intelligence
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Knowledge Discovery in Clinical Performance of Cancer Patients

BIBM '08 Proceedings of the 2008 IEEE International Conference on Bioinformatics and Biomedicine
Stacked generalization: when does it work?

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
A hybrid random subspace classifier fusion approach for protein mass spectra classification

EvoBIO'08 Proceedings of the 6th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Machine learning for medical diagnosis: history, state of the art and perspective

Artificial Intelligence in Medicine

The Role of Biomedical Dataset in Classification

AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
A balanced neural tree for pattern classification

Neural Networks
Predicting the need for CT imaging in children with minor head injury using an ensemble of Naive Bayes classifiers

Artificial Intelligence in Medicine
Similarity-Dissimilarity Plot for Visualization of High Dimensional Data in Biomedical Pattern Classification

Journal of Medical Systems
In-execution dynamic malware analysis and detection by mining information in process control blocks of Linux OS

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biomedical datasets pose a unique challenge to machine learning and data mining algorithms for classification because of their high dimensionality, multiple classes, noisy data and missing values. This paper provides a comprehensive evaluation of a set of diverse machine learning schemes on a number of biomedical datasets. To this end, we follow a four step evaluation methodology: (1) pre-processing the datasets to remove any redundancy, (2) classification of the datasets using six different machine learning algorithms; Naive Bayes (probabilistic), multi-layer perceptron (neural network), SMO (support vector machine), IBk (instance based learner), J48 (decision tree) and RIPPER (rule-based induction), (3) bagging and boosting each algorithm, and (4) combining the best version of each of the base classifiers to make a team of classifiers with stacking and voting techniques. Using this methodology, we have performed experiments on 31 different biomedical datasets. To the best of our knowledge, this is the first study in which such a diverse set of machine learning algorithms are evaluated on so many biomedical datasets. The important outcome of our extensive study is a set of promising guidelines which will help researchers in choosing the best classification scheme for a particular nature of biomedical dataset.