A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning

Authors:
Carine Hue;Marc Boullé
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2007

Citing 11
Cited 2

The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Gaussian Processes for Ordinal Regression

The Journal of Machine Learning Research
A Bayes Optimal Approach for Partitioning the Values of Categorical Attributes

The Journal of Machine Learning Research
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,

Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
New approaches to support vector ordinal regression

ICML '05 Proceedings of the 22nd international conference on Machine learning
MODL: A Bayes optimal discretization method for continuous attributes

Machine Learning
Quantile Regression Forests

The Journal of Machine Learning Research
Nonparametric Quantile Estimation

The Journal of Machine Learning Research
Compression-Based Averaging of Selective Naive Bayes Classifiers

The Journal of Machine Learning Research
Optimal bayesian 2d-discretization for variable ranking in regression

DS'06 Proceedings of the 9th international conference on Discovery Science
Lessons learned in the challenge: making predictions and scoring them

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment

Model Selection: Beyond the Bayesian/Frequentist Divide

The Journal of Machine Learning Research
Chapter 15: search computing and the life sciences

Search Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we consider the supervised learning task which consists in predicting the normalized rank of a numerical variable. We introduce a novel probabilistic approach to estimate the posterior distribution of the target rank conditionally to the predictors. We turn this learning task into a model selection problem. For that, we define a 2D partitioning family obtained by discretizing numerical variables and grouping categorical ones and we derive an analytical criterion to select the partition with the highest posterior probability. We show how these partitions can be used to build univariate predictors and multivariate ones under a naive Bayes assumption. We also propose a new evaluation criterion for probabilistic rank estimators. Based on the logarithmic score, we show that such criterion presents the advantage to be minored, which is not the case of the logarithmic score computed for probabilistic value estimator. A first set of experimentations on synthetic data shows the good properties of the proposed criterion and of our partitioning approach. A second set of experimentations on real data shows competitive performance of the univariate and selective naive Bayes rank estimators projected on the value range compared to methods submitted to a recent challenge on probabilistic metric regression tasks. Our approach is applicable for all regression problems with categorical or numerical predictors. It is particularly interesting for those with a high number of predictors as it automatically detects the variables which contain predictive information. It builds pertinent predictors of the normalized rank of the numerical target from one or several predictors. As the criteria selection is regularized by the presence of a prior and a posterior term, it does not suffer from overfitting.