Modelling complex data by learning which variable to construct

Authors:
Françoise Fessant;Aurélie Le Cam;Marc Boullé;Raphaël Féraud
Affiliations:
Orange Labs, Lannion, France;Orange Labs, Lannion, France;Orange Labs, Lannion, France;Orange Labs, Lannion, France
Venue:
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Year:
2010

Citing 9
Cited 0

Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
A methodology to explain neural network classification

Neural Networks
An introduction to variable and feature selection

The Journal of Machine Learning Research
A Bayes Optimal Approach for Partitioning the Values of Categorical Attributes

The Journal of Machine Learning Research
MODL: A Bayes optimal discretization method for continuous attributes

Machine Learning
Learning a meta-level prior for feature relevance from multiple related tasks

Proceedings of the 24th international conference on Machine learning
Partially supervised feature selection with regularized linear models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
The orange customer analysis platform

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses a task of variable selection which consists in choosing a subset of variables that is sufficient to predict the target label well. Here instead of trying to directly determine which variables are better, we make use of prior knowledge to learn the properties of good variables and guide the selection towards the most relevant dimensions. For this purpose we assume that a variable can be represented by a set of indicators that describe both the properties of the variable and its potential relationship to the targeting problem. This approach enables the prediction of the relevance of variables without measuring their value on the training instances. We devise a selection methodology that can efficiently search for new good variables in the presence of a huge number of variables and to dramatically reduce the number of variable measurements needed. Our algorithm is illustrated on an industrial CRM application.