An Exact Probability Metric for Decision Tree Splitting and Stopping
Machine Learning
Decision Tree Induction Based on Efficient Tree Restructuring
Machine Learning
General and Efficient Multisplitting of Numerical Attributes
Machine Learning
Textual Data Mining to Support Science and Technology Management
Journal of Intelligent Information Systems
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
Data Mining and Knowledge Discovery
A Unified Framework for Evaluation Metrics in Classification Using Decision Trees
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Empirical Evaluation of Feature Subset Selection Based on a Real-World Data Set
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
GD: A Measure Based on Information Theory for Attribute Selection
IBERAMIA '98 Proceedings of the 6th Ibero-American Conference on AI: Progress in Artificial Intelligence
Theoretical Comparison between the Gini Index and Information Gain Criteria
Annals of Mathematics and Artificial Intelligence
The Knowledge Engineering Review
A plethora of methods for learning English countability
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
End user friendly data mining with decision trees: a reality or a wish?
CEA'07 Proceedings of the 2007 annual Conference on International Conference on Computer Engineering and Applications
On biases in estimating multi-valued attributes
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Accuracy of intelligent medical systems
Computer Methods and Programs in Biomedicine
Predictive algorithms in the management of computer systems
IBM Systems Journal
Simulated evaluation of faceted browsing based on feature selection
Multimedia Tools and Applications
Data mining on multimedia data
Data mining on multimedia data
Ensemble missing data techniques for software effort prediction
Intelligent Data Analysis
Bias of importance measures for multi-valued attributes and solutions
ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
How to interpret decision trees?
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
An example-based study on chinese word segmentation using critical fragments
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
A new variable selection approach using Random Forests
Computational Statistics & Data Analysis
A new variable importance measure for random forests with missing data
Statistics and Computing
Hi-index | 0.00 |
A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.