Demoting redundant features to improve the discriminatory ability in cancer data

Authors:
M. Osl;S. Dreiseitl;F. Cerqueira;M. Netzer;B. Pfeifer;C. Baumgartner
Affiliations:
Institute of Biomedical Engineering, University for Health Sciences, Medical Informatics and Technology, Hall in Tyrol, Austria;Institute of Biomedical Engineering, University for Health Sciences, Medical Informatics and Technology, Hall in Tyrol, Austria and Department of Software Engineering, Upper Austria University of ...;Institute of Biomedical Engineering, University for Health Sciences, Medical Informatics and Technology, Hall in Tyrol, Austria;Institute of Biomedical Engineering, University for Health Sciences, Medical Informatics and Technology, Hall in Tyrol, Austria;Institute of Biomedical Engineering, University for Health Sciences, Medical Informatics and Technology, Hall in Tyrol, Austria;Institute of Biomedical Engineering, University for Health Sciences, Medical Informatics and Technology, Hall in Tyrol, Austria
Venue:
Journal of Biomedical Informatics
Year:
2009

Citing 6
Cited 0

Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Guilt-by-association feature selection: Identifying biomarkers from proteomic profiles

Journal of Biomedical Informatics
A new rule-based algorithm for identifying metabolic markers in prostate cancer using tandem mass spectrometry

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The identification of a set of relevant but not redundant features is an important first step in building predictive and diagnostic models from biomedical data sets. Most commonly, individual features are ranked in terms of a quality criterion, out of which the best (first) k features are selected. However, feature ranking methods do not sufficiently account for interactions and correlations between the features. Thus, redundancy is likely to be encountered in the selected features. We present a new algorithm, termed Redundancy Demoting (RD), that takes an arbitrary feature ranking as input, and improves this ranking by identifying redundant features and demoting them to positions in the ranking in which they are not redundant. Redundant features are those that are correlated with other features and not relevant in the sense that they do not improve the discriminatory ability of a set of features. Experiments on two cancer data sets, one melanoma image data set and one lung cancer microarray data set, show that our algorithm greatly improves the feature rankings provided by the methods information gain, ReliefF and Student's t-test in terms of predictive power.