Guidelines to Select Machine Learning Scheme for Classification of Biomedical Datasets

  • Authors:
  • Ajay Kumar Tanwani;Jamal Afridi;M. Zubair Shafiq;Muddassar Farooq

  • Affiliations:
  • Next Generation Intelligent Networks Research Center (nexGIN RC), National University of Computer & Emerging Sciences (FAST-NU), Islamabad, Pakistan;Next Generation Intelligent Networks Research Center (nexGIN RC), National University of Computer & Emerging Sciences (FAST-NU), Islamabad, Pakistan;Next Generation Intelligent Networks Research Center (nexGIN RC), National University of Computer & Emerging Sciences (FAST-NU), Islamabad, Pakistan;Next Generation Intelligent Networks Research Center (nexGIN RC), National University of Computer & Emerging Sciences (FAST-NU), Islamabad, Pakistan

  • Venue:
  • EvoBIO '09 Proceedings of the 7th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Biomedical datasets pose a unique challenge to machine learning and data mining algorithms for classification because of their high dimensionality, multiple classes, noisy data and missing values. This paper provides a comprehensive evaluation of a set of diverse machine learning schemes on a number of biomedical datasets. To this end, we follow a four step evaluation methodology: (1) pre-processing the datasets to remove any redundancy, (2) classification of the datasets using six different machine learning algorithms; Naive Bayes (probabilistic), multi-layer perceptron (neural network), SMO (support vector machine), IBk (instance based learner), J48 (decision tree) and RIPPER (rule-based induction), (3) bagging and boosting each algorithm, and (4) combining the best version of each of the base classifiers to make a team of classifiers with stacking and voting techniques. Using this methodology, we have performed experiments on 31 different biomedical datasets. To the best of our knowledge, this is the first study in which such a diverse set of machine learning algorithms are evaluated on so many biomedical datasets. The important outcome of our extensive study is a set of promising guidelines which will help researchers in choosing the best classification scheme for a particular nature of biomedical dataset.