Optimal Partitioning for Classification and Regression Trees
IEEE Transactions on Pattern Analysis and Machine Intelligence
C4.5: programs for machine learning
C4.5: programs for machine learning
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Applications of information theory to pattern recognition and the design of decision trees and trellises
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Towards an effective cooperation of the user and the computer for classification
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Linear-Time Preprocessing in Optimal Numerical Range Partitioning
Journal of Intelligent Information Systems - Special issue: A survey of research questions for intelligent information systems in education
Generalized Entropy and Projection Clustering of Categorical Data
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates
Data Mining and Knowledge Discovery
JRV: an interactive tool for data mining visualization
ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
On the Computational Complexity of Optimal Multisplitting
Fundamenta Informaticae - Intelligent Systems
Data mining on multimedia data
Data mining on multimedia data
How to interpret decision trees?
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
On the Computational Complexity of Optimal Multisplitting
Fundamenta Informaticae - Intelligent Systems
Hi-index | 0.00 |
To find the optimal branching of a nominal attribute at a node in anL-ary decision tree, one is often forced to search over all possibleL-ary partitions for the one that yields the minimum impurity measure.For binary trees (L = 2) when there are just two classes a short-cutsearch is possible that is linear in n, the number of distinct valuesof the attribute.For the general case in which the number of classes, k, may begreater than two, Burshtein et al. have shown that the optimalpartition satisfies a condition that involves the existence of({L\atop 2}) hyperplanes in the class probability space.We derive a property of the optimal partition for concave impurity measures (including in particularthe Gini and entropy impurity measures) in terms of the existence ofL vectors in the dualof the class probability space, which implies the earlier condition.Unfortunately, these insights still do not offer a practical searchmethod when n and k are large, even for binary trees.We therefore present a new heuristic search algorithm tofind a good partition. It is based on ordering the attribute‘svalues according to their principal component scores in the classprobability space, and is linear in n. We demonstrate theeffectiveness of the new method through Monte Carlo simulationexperiments and compare its performance against other heuristic methods.