Informative variables selection for multi-relational supervised learning

Authors:
Dhafer Lahbib;Marc Boullé;Dominique Laurent
Affiliations:
France Telecom R&D - 2, Lannion and ETIS-CNRS-Universite de Cergy Pontoise-ENSEA, Cergy Pontoise;France Telecom R&D - 2, Lannion;ETIS-CNRS-Universite de Cergy Pontoise-ENSEA, Cergy Pontoise
Venue:
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Year:
2011

Citing 7
Cited 0

Inductive logic programming and knowledge discovery in databases

Advances in knowledge discovery and data mining
Propositionalization approaches to relational data mining

Relational Data Mining
Filtering Multi-Instance Problems to Reduce Dimensionality in Relational Learning

Journal of Intelligent Information Systems
An introduction to variable and feature selection

The Journal of Machine Learning Research
MODL: A Bayes optimal discretization method for continuous attributes

Machine Learning
FARS: A Multi-relational Feature and Relation Selection Approach for Efficient Classification

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
An analysis of Bayesian classifiers

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. To cope with this one-to-many setting, most of the existing approaches try to transform the multi-table representation, namely by propositionalisation, thereby losing the naturally compact initial representation and eventually introducing statistical bias. Our approach aims to directly evaluate the informativness of the original input variables over the relational domain w.r.t. the target variable. The idea is to summarize for each individual the information contained in the non target table variable by a features tuple representing the cardinalities of the initial modalities. Multivariate grid models have been used to qualify the joint information brought by the new features, which is equivalent to estimate the conditional density of the target variable given the input variable in non target table. Preliminary experiments on artificial and real data sets show that the approach allows to potentially identify relevant one-tomany variables. In this article, we focus on binary variables because of space constraints.