Random Reducts: A Monte Carlo Rough Set-based Method for Feature Selection in Large Datasets

Authors:
Marcin Kruczyk;Nicholas Baltzer;Jakub Mieczkowski;Michał Dramiński;Jacek Koronacki;Jan Komorowski
Affiliations:
Postgraduate School of Molecular Medicine, Warsaw, Poland. marcin.kruczyk@icm.uu.se;Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden. nicholas.baltzer@gmail.com;Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland. j.mieczkowski@nencki.gov.pl;Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland. michal.draminski@gmail.com;Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland. Jacek.Koronacki@ipipan.waw.pl;Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland. jan.komorowski@lcb.uu.se
Venue:
Fundamenta Informaticae - To Andrzej Skowron on His 70th Birthday
Year:
2013

Citing 4
Cited 0

The rough sets theory and evidence theory

Fundamenta Informaticae
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Dynamic Reducts as a Tool for Extracting Laws from Decisions Tables

ISMIS '94 Proceedings of the 8th International Symposium on Methodologies for Intelligent Systems
Monte Carlo feature selection for supervised classification

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important step prior to constructing a classifier for a very large data set is feature selection. With many problems it is possible to find a subset of attributes that have the same discriminative power as the full data set. There are many feature selection methods but in none of them are Rough Set models tied up with statistical argumentation. Moreover, known methods of feature selection usually discard shadowed features, i.e. those carrying the same or partially the same information as the selected features. In this study we present Random Reducts RR-a feature selection method which precedes classification per se. The method is based on the Monte Carlo Feature Selection MCFS layout and uses Rough Set Theory in the feature selection process. On synthetic data, we demonstrate that the method is able to select otherwise shadowed features of which the user should be made aware, and to find interactions in the data set.