Optimal bayesian 2d-discretization for variable ranking in regression

Authors:
Marc Boullé;Carine Hue
Affiliations:
France Télécom R&D Lannion;France Télécom R&D Lannion
Venue:
DS'06 Proceedings of the 9th international conference on Discovery Science
Year:
2006

Citing 5
Cited 1

Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
An introduction to variable and feature selection

The Journal of Machine Learning Research
MODL: A Bayes optimal discretization method for continuous attributes

Machine Learning
Minimum description length induction, Bayesianism, and Kolmogorov complexity

IEEE Transactions on Information Theory

A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In supervised machine learning, variable ranking aims at sorting the input variables according to their relevance w.r.t. an output variable. In this paper, we propose a new relevance criterion for variable ranking in a regression problem with a large number of variables. This criterion comes from a discretization of both input and output variables, derived as an extension of a Bayesian non parametric discretization method for the classification case. For that, we introduce a family of discretization grid models and a prior distribution defined on this model space. For this prior, we then derive the exact Bayesian model selection criterion. The obtained most probable grid-partition of the data emphasizes the relation (or the absence of relation) between inputs and output and provides a ranking criterion for the input variables. Preliminary experiments both on synthetic and real data demonstrate the criterion capacity to select the most relevant variables and to improve a regression tree.