Supervised selection of dynamic features, with an application to telecommunication data preparation

Authors:
Sylvain Ferrandiz;Marc Boullé
Affiliations:
France Télécom R&D, LANNION Cedex, France;France Télécom R&D, LANNION Cedex, France
Venue:
ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Year:
2006

Citing 5
Cited 0

Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
An introduction to variable and feature selection

The Journal of Machine Learning Research
Margin based feature selection - theory and algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Supervised evaluation of Voronoi partitions

Intelligent Data Analysis
A grouping method for categorical attributes having very large number of values

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the field of data mining, data preparation has more and more in common with a bottleneck. Indeed, collecting and storing data becomes cheaper while modelling costs remain unchanged. As a result, feature selection is now usually performed. In the data preparation step, selection often relies on feature ranking. In the supervised classification context, ranking is based on the information that the explanatory feature brings on the target categorical attribute. With the increasing presence in the database of feature measured over time, i.e. dynamic features, new supervised ranking methods have to be designed. In this paper, we propose a new method to evaluate dynamic features, which is derived from a probabilistic criterion. The criterion is non-parametric and handles automatically the problem of overfitting the data. The resulting evaluation produces reliable results. Furthermore, the design of the criterion relies on an understandable and simple approach. This allows to provide meaningful visualization of the evaluation, in addition to the computed score. The advantages of the new method are illustrated on a telecommunication dataset.