Automatic feature selection for classification of health data

Authors:
Hongxing He;Huidong Jin;Jie Chen
Affiliations:
CSIRO Mathematical and Information Sciences, Canberra, ACT, Australia;CSIRO Mathematical and Information Sciences, Canberra, ACT, Australia;CSIRO Mathematical and Information Sciences, Canberra, ACT, Australia
Venue:
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Year:
2005

Citing 4
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Redundancy based feature selection for microarray data

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature selection with conditional mutual information maximin in text categorization

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Fast Binary Feature Selection with Conditional Mutual Information

The Journal of Machine Learning Research

Analysis of breast feeding data using data mining methods

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Discovering prediction model for environmental distribution maps

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

For classification of health data, we propose in this paper a fast and accurate feature selection method, FIEBIT (Feature Inclusion and Exclusion Based on Information Theory). FIEBIT selects the most relevant and non-redundant features using Conditional Mutual Information (CMU) while excluding irrelevant and redundant features according to the comparison among Individual Symmetrical Uncertainty (ISU) and Combined Symmetrical Uncertainty (CSU). Small feature subsets are selected before classification without compromising the classification accuracy. In addition, the size of the feature subset is determined automatically. Our preliminary empirical results on health data with hundreds of features suggest FIEBIT is efficient and effective in comparison with representative feature selection methods.