A new dataset evaluation method based on category overlap

Authors:
Sejong Oh
Affiliations:
Department of Nanobiomedical Science, Dankook University, Cheonan 330-714, Republic of Korea
Venue:
Computers in Biology and Medicine
Year:
2011

Citing 11
Cited 3

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF

Applied Intelligence
The Hausdor_ Distance Measure for Feature Selection in Learning Applications

HICSS '99 Proceedings of the Thirty-second Annual Hawaii International Conference on System Sciences-Volume 6 - Volume 6
An introduction to variable and feature selection

The Journal of Machine Learning Research
Margin based feature selection - theory and algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature Selection for Unsupervised Learning

The Journal of Machine Learning Research
Invariant optimal feature selection: A distance discriminant and feature ranking based solution

Pattern Recognition
The feature selection problem: traditional methods and a new algorithm

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
MCES: A Novel Monte Carlo Evaluative Selection Approach for Objective Feature Selections

IEEE Transactions on Neural Networks

Derivation of an artificial gene to improve classification accuracy upon gene selection

Computational Biology and Chemistry
A novel divide-and-merge classification for high dimensional datasets

Computational Biology and Chemistry
RFS: Efficient feature selection method based on R-value

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

The quality of dataset has a profound effect on classification accuracy, and there is a clear need for some method to evaluate this quality. In this paper, we propose a new dataset evaluation method using the R-value measure. This proposed method is based on the ratio of overlapping areas among categories in a dataset. A high R-value for a dataset indicates that the dataset contains wide overlapping areas among its categories, and classification accuracy on the dataset may become low. We can use the R-value measure to understand the characteristics of a dataset, the feature selection process, and the proper design of new classifiers.