MODL: A Bayes optimal discretization method for continuous attributes

Authors:
Marc Boullé
Affiliations:
France Telecom R&D, Lannion, France 22300
Venue:
Machine Learning
Year:
2006

Citing 11
Cited 15

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
General and Efficient Multisplitting of Numerical Attributes

Machine Learning
FUSINTER: a method for discretization of continuous attributes

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Multivariate discretization for set mining

Knowledge and Information Systems
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
On Changing Continuous Attributes into Ordered Discrete Attributes

EWSL '91 Proceedings of the European Working Session on Machine Learning
Khiops: A Statistical Discretization Method of Continuous Attributes

Machine Learning
Khiops: a discretization method of continuous attributes with guaranteed resistance to noise

MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Minimum description length induction, Bayesianism, and Kolmogorov complexity

IEEE Transactions on Information Theory

Tracking Web spam with HTML style similarities

ACM Transactions on the Web (TWEB)
A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning

The Journal of Machine Learning Research
Improved Comprehensibility and Reliability of Explanations via Restricted Halfspace Discretization

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Using Resampling Techniques for Better Quality Discretization

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
A Parameter-Free Classification Method for Large Scale Learning

The Journal of Machine Learning Research
On improving discretization quality by a bagging technique

ICNC'09 Proceedings of the 5th international conference on Natural computation
The orange customer analysis platform

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Modelling complex data by learning which variable to construct

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Review:

The Knowledge Engineering Review
Informative variables selection for multi-relational supervised learning

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Optimal bayesian 2d-discretization for variable ranking in regression

DS'06 Proceedings of the 9th international conference on Discovery Science
A bayesian approach for classification rule mining in quantitative databases

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
UniDis: a universal discretization technique

Journal of Intelligent Information Systems
Examination and comparison of conflicting data in granulated datasets: Equal width interval vs. equal frequency interval

Information Sciences: an International Journal
Specific-class distance measures for nominal attributes

AI Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

While real data often comes in mixed format, discrete and continuous, many supervised induction algorithms require discrete data. Efficient discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. In this paper, we propose a new discretization method MODL1, founded on a Bayesian approach. We introduce a space of discretization models and a prior distribution defined on this model space. This results in the definition of a Bayes optimal evaluation criterion of discretizations. We then propose a new super-linear optimization algorithm that manages to find near-optimal discretizations. Extensive comparative experiments both on real and synthetic data demonstrate the high inductive performances obtained by the new discretization method.