Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
An introduction to variable and feature selection
The Journal of Machine Learning Research
Minimum description length induction, Bayesianism, and Kolmogorov complexity
IEEE Transactions on Information Theory
A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning
The Journal of Machine Learning Research
Hi-index | 0.00 |
In supervised machine learning, variable ranking aims at sorting the input variables according to their relevance w.r.t. an output variable. In this paper, we propose a new relevance criterion for variable ranking in a regression problem with a large number of variables. This criterion comes from a discretization of both input and output variables, derived as an extension of a Bayesian non parametric discretization method for the classification case. For that, we introduce a family of discretization grid models and a prior distribution defined on this model space. For this prior, we then derive the exact Bayesian model selection criterion. The obtained most probable grid-partition of the data emphasizes the relation (or the absence of relation) between inputs and output and provides a ranking criterion for the input variables. Preliminary experiments both on synthetic and real data demonstrate the criterion capacity to select the most relevant variables and to improve a regression tree.