Elements of information theory
Elements of information theory
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
On the boosting ability of top-down decision tree learning algorithms
Journal of Computer and System Sciences
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality
Data Mining and Knowledge Discovery
On Changing Continuous Attributes into Ordered Discrete Attributes
EWSL '91 Proceedings of the European Working Session on Machine Learning
Fast Minimum Training Error Discretization
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Unified Bias-Variance Decomposition for Zero-One and Squared Loss
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Estimation of entropy and mutual information
Neural Computation
Theoretical Comparison between the Gini Index and Information Gain Criteria
Annals of Mathematics and Artificial Intelligence
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
Improved Algorithms for Univariate Discretization of Continuous Features
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
On biases in estimating multi-valued attributes
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Naive bayes classifiers that perform well with continuous variables
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
We study the problem of finding the most uniform partition of the class label distribution on an interval. This problem occurs, e.g., in supervised discretization of continuous features, where evaluation heuristics need to find the location of the best place to split the current feature. The weighted average of empirical entropies of the interval label distributions is often used in this task. We observe that this rule is suboptimal, because it prefers short intervals too much. Therefore, we proceed to study alternative approaches. A solution that is based on compression turns out to be the best in our empirical experiments. We also study how these alternative methods affect the performance of classification algorithms.