C4.5: programs for machine learning
C4.5: programs for machine learning
Data preparation for data mining
Data preparation for data mining
Machine Learning
Value Grouping for Binary Trees
Value Grouping for Binary Trees
Data Mining
ChiMerge: discretization of numeric attributes
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
An analysis of Bayesian classifiers
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Induction of selective Bayesian classifiers
UAI'94 Proceedings of the Tenth international conference on Uncertainty in artificial intelligence
Wrapper discretization by means of estimation of distribution algorithms
Intelligent Data Analysis
Supervised selection of dynamic features, with an application to telecommunication data preparation
ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Hi-index | 0.00 |
In supervised machine learning, the partitioning of the values (also called grouping) of a categorical attribute aims at constructing a new synthetic attribute which keeps the information of the initial attribute and reduces the number of its values. In case of very large number of values, the risk of overfitting the data increases sharply and building good groupings becomes difficult. In this paper, we propose two new grouping methods founded on a Bayesian approach, leading to Bayes optimal groupings. The first method exploits a standard schema for grouping models and the second one extends this schema by managing a “garbage” group dedicated to the least frequent values. Extensive comparative experiments demonstrate that the new grouping methods build high quality groupings in terms of predictive quality, robustness and small number of groups.