Feature selection in an electric billing database considering attribute inter-dependencies

Authors:
Manuel Mejía-Lavalle;Eduardo F. Morales
Affiliations:
Instituto de Investigaciones Eléctricas, Cuernavaca, Morelos, México;INAOE, StMa. Tonantzintla, Puebla, México
Venue:
ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Year:
2006

Citing 14
Cited 0

Decision trees and multi-valued attributes

Machine intelligence 11
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
A Monotonic Measure for Optimal Feature Selection

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Empirical Evaluation of Feature Subset Selection Based on a Real-World Data Set

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Evaluating Feature Selection Methods for Learning in Data Mining Applications

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 5 - Volume 5
Feature Selection Algorithms: A Survey and Experimental Evaluation

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An introduction to variable and feature selection

The Journal of Machine Learning Research
Fast Branch & Bound Algorithms for Optimal Feature Selection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Testing the significance of attribute interactions

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
A Branch and Bound Algorithm for Feature Subset Selection

IEEE Transactions on Computers
A distance-based branch and bound feature selection algorithm

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Data mining in soft computing framework: a survey

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the increasing size of databases, feature selection has become a relevant and challenging problem for the area of knowledge discovery in databases. An effective feature selection strategy can significantly reduce the data mining processing time, improve the predicted accuracy, and help to understand the induced models, as they tend to be smaller and make more sense to the user. Many feature selection algorithms assumed that the attributes are independent between each other given the class, which can produce models with redundant attributes and/or exclude sets of attributes that are relevant when considered together. In this paper, an effective best first search algorithm, called buBF, for feature selection is described. buBF uses a novel heuristic function based on n-way entropy to capture inter-dependencies among variables. It is shown that buBF produces more accurate models than other state-of-the-art feature selection algorithms when compared on several real and synthetic datasets. Specifically we apply buBF to a Mexican Electric Billing database and obtain satisfactory results.