Minimax-optimal classification with dyadic decision trees

Authors:
C. Scott;R. D. Nowak
Affiliations:
Dept. of Stat., Rice Univ., Houston, TX, USA;-
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 11

Optimal dyadic decision trees

Machine Learning
Learning Minimum Volume Sets

The Journal of Machine Learning Research
Similarity computing model of high dimension data for symptom classification of Chinese traditional medicine

Applied Soft Computing
Classification Using Geometric Level Sets

The Journal of Machine Learning Research
Which spatial partition trees are adaptive to intrinsic dimension?

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Complexity-penalized estimation of minimum volume sets for dependent data

Journal of Multivariate Analysis
Nonproduct data-dependent partitions for mutual information estimation: strong consistency and applications

IEEE Transactions on Signal Processing
On signal representations within the Bayes decision framework

Pattern Recognition
Evaluation of decision tree pruning with subadditive penalties

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
A tree-based regressor that adapts to intrinsic dimension

Journal of Computer and System Sciences
Risk bounds for CART classifiers under a margin condition

Pattern Recognition

Quantified Score

Hi-index	754.84

Visualization

Abstract

Decision trees are among the most popular types of classifiers, with interpretability and ease of implementation being among their chief attributes. Despite the widespread use of decision trees, theoretical analysis of their performance has only begun to emerge in recent years. In this paper, it is shown that a new family of decision trees, dyadic decision trees (DDTs), attain nearly optimal (in a minimax sense) rates of convergence for a broad range of classification problems. Furthermore, DDTs are surprisingly adaptive in three important respects: they automatically 1) adapt to favorable conditions near the Bayes decision boundary; 2) focus on data distributed on lower dimensional manifolds; and 3) reject irrelevant features. DDTs are constructed by penalized empirical risk minimization using a new data-dependent penalty and may be computed exactly with computational complexity that is nearly linear in the training sample size. DDTs comprise the first classifiers known to achieve nearly optimal rates for the diverse class of distributions studied here while also being practical and implementable. This is also the first study (of which we are aware) to consider rates for adaptation to intrinsic data dimension and relevant features.