Modelling complex data by learning which variable to construct

  • Authors:
  • Françoise Fessant;Aurélie Le Cam;Marc Boullé;Raphaël Féraud

  • Affiliations:
  • Orange Labs, Lannion, France;Orange Labs, Lannion, France;Orange Labs, Lannion, France;Orange Labs, Lannion, France

  • Venue:
  • DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses a task of variable selection which consists in choosing a subset of variables that is sufficient to predict the target label well. Here instead of trying to directly determine which variables are better, we make use of prior knowledge to learn the properties of good variables and guide the selection towards the most relevant dimensions. For this purpose we assume that a variable can be represented by a set of indicators that describe both the properties of the variable and its potential relationship to the targeting problem. This approach enables the prediction of the relevance of variables without measuring their value on the training instances. We devise a selection methodology that can efficiently search for new good variables in the presence of a huge number of variables and to dramatically reduce the number of variable measurements needed. Our algorithm is illustrated on an industrial CRM application.