The Role of Biomedical Dataset in Classification

Authors:
Ajay Kumar Tanwani;Muddassar Farooq
Affiliations:
Next Generation Intelligent Networks Research Center (nexGIN RC), National University of Computer & Emerging Sciences (FAST-NUCES), Islamabad, Pakistan 44000;Next Generation Intelligent Networks Research Center (nexGIN RC), National University of Computer & Emerging Sciences (FAST-NUCES), Islamabad, Pakistan 44000
Venue:
AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
Year:
2009

Citing 3
Cited 4

Complexity Measures of Supervised Classification Problems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Guidelines to Select Machine Learning Scheme for Classification of Biomedical Datasets

EvoBIO '09 Proceedings of the 7th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics

A genetic algorithm-based rule extraction system

Applied Soft Computing
In-execution dynamic malware analysis and detection by mining information in process control blocks of Linux OS

Information Sciences: an International Journal
A new hybrid metaheuristic for medical data classification

International Journal of Metaheuristics
A combined approach to tackle imbalanced data sets

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate the role of a biomedical dataset on the classification accuracy of an algorithm. We quantify the complexity of a biomedical dataset using five complexity measures: correlation-based feature selection subset merit, noise, imbalance ratio, missing values and information gain. The effect of these complexity measures on classification accuracy is evaluated using five diverse machine learning algorithms: J48 (decision tree), SMO (support vector machines), Naive Bayes (probabilistic), IBk (instance based learner) and JRIP (rule-based induction). The results of our experiments show that noise and correlation-based feature selection subset merit --- not a particular choice of algorithm --- play a major role in determining the classification accuracy. In the end, we provide researchers with a meta-model and an empirical equation to estimate the classification potential of a dataset on the basis of its complexity. This well help researchers to efficiently pre-process the dataset for automatic knowledge extraction.