Boruta - A System for Feature Selection

Authors:
Miron B. Kursa;Aleksander Jankowski;Witold R. Rudnicki
Affiliations:
(Correspd.) ICM, University of Warsaw, Pawińskiego 5a, Warsaw, Poland. W.Rudnicki@icm.edu.pl;ICM, University of Warsaw, Pawińskiego 5a, Warsaw, Poland. W.Rudnicki@icm.edu.pl;ICM, University of Warsaw, Pawińskiego 5a, Warsaw, Poland. W.Rudnicki@icm.edu.pl
Venue:
Fundamenta Informaticae
Year:
2010

Citing 4
Cited 2

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Neural networks for pattern recognition

Neural networks for pattern recognition
Random Forests

Machine Learning
A statistical method for determining importance of variables in an information system

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing

All that jazz in the random forest

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Playing in unison in the random forest

SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine learning methods are often used to classify objects described by hundreds of attributes; in many applications of this kind a great fraction of attributes may be totally irrelevant to the classification problem. Even more, usually one cannot decide a priori which attributes are relevant. In this paper we present an improved version of the algorithm for identification of the full set of truly important variables in an information system. It is an extension of the random forest method which utilises the importance measure generated by the original algorithm. It compares, in the iterative fashion, the importances of original attributes with importances of their randomised copies. We analyse performance of the algorithm on several examples of synthetic data, as well as on a biologically important problem, namely on identification of the sequence motifs that are important for aptameric activity of short RNA sequences.