C4.5: programs for machine learning
C4.5: programs for machine learning
FUSINTER: a method for discretization of continuous attributes
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
Multi-interval Discretization Methods for Decision Tree Learning
SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
Optimal bin number for equal frequency discretizations in supervized learning
Intelligent Data Analysis
Wrapper discretization by means of estimation of distribution algorithms
Intelligent Data Analysis
Hi-index | 0.00 |
In supervised machine learning, some algorithms are restricted to discrete data and need to discretize continuous attributes. The Khiops discretization method, based on chi-square statistics, optimizes the chi-square criterion in a global manner on the whole discretization domain. In this paper, we propose a major evolution of the Khiops algorithm, that provides guarantees against overfitting and thus significantly improve the robustness of the discretizations. This enhancement is based on a statistical modeling of the Khiops algorithm, derived from the study of the variations of the chi-square value during the discretization process. This modeling, experimentally checked, allows to modify the algorithm and to bring a true control of overfitting. Extensive experiments demonstrate the validity of the approach and show that the Khiops method builds high quality discretizations, both in terms of accuracy and of small interval number.