Informative variables selection for multi-relational supervised learning

  • Authors:
  • Dhafer Lahbib;Marc Boullé;Dominique Laurent

  • Affiliations:
  • France Telecom R&D - 2, Lannion and ETIS-CNRS-Universite de Cergy Pontoise-ENSEA, Cergy Pontoise;France Telecom R&D - 2, Lannion;ETIS-CNRS-Universite de Cergy Pontoise-ENSEA, Cergy Pontoise

  • Venue:
  • MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. To cope with this one-to-many setting, most of the existing approaches try to transform the multi-table representation, namely by propositionalisation, thereby losing the naturally compact initial representation and eventually introducing statistical bias. Our approach aims to directly evaluate the informativness of the original input variables over the relational domain w.r.t. the target variable. The idea is to summarize for each individual the information contained in the non target table variable by a features tuple representing the cardinalities of the initial modalities. Multivariate grid models have been used to qualify the joint information brought by the new features, which is equivalent to estimate the conditional density of the target variable given the input variable in non target table. Preliminary experiments on artificial and real data sets show that the approach allows to potentially identify relevant one-tomany variables. In this article, we focus on binary variables because of space constraints.