Random Reducts: A Monte Carlo Rough Set-based Method for Feature Selection in Large Datasets

  • Authors:
  • Marcin Kruczyk;Nicholas Baltzer;Jakub Mieczkowski;Michał Dramiński;Jacek Koronacki;Jan Komorowski

  • Affiliations:
  • Postgraduate School of Molecular Medicine, Warsaw, Poland. marcin.kruczyk@icm.uu.se;Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden. nicholas.baltzer@gmail.com;Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland. j.mieczkowski@nencki.gov.pl;Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland. michal.draminski@gmail.com;Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland. Jacek.Koronacki@ipipan.waw.pl;Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland. jan.komorowski@lcb.uu.se

  • Venue:
  • Fundamenta Informaticae - To Andrzej Skowron on His 70th Birthday
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

An important step prior to constructing a classifier for a very large data set is feature selection. With many problems it is possible to find a subset of attributes that have the same discriminative power as the full data set. There are many feature selection methods but in none of them are Rough Set models tied up with statistical argumentation. Moreover, known methods of feature selection usually discard shadowed features, i.e. those carrying the same or partially the same information as the selected features. In this study we present Random Reducts RR-a feature selection method which precedes classification per se. The method is based on the Monte Carlo Feature Selection MCFS layout and uses Rough Set Theory in the feature selection process. On synthetic data, we demonstrate that the method is able to select otherwise shadowed features of which the user should be made aware, and to find interactions in the data set.