A tree-based regressor that adapts to intrinsic dimension

Authors:
Samory Kpotufe;Sanjoy Dasgupta
Affiliations:
Max Planck Institute for Intelligent Systems, Germany;UCSD Computer Science and Engineering, United States
Venue:
Journal of Computer and System Sciences
Year:
2012

Citing 11
Cited 1

Laplacian Eigenmaps for dimensionality reduction and data representation

Neural Computation
Bounded Geometries, Fractals, and Low-Distortion Embeddings

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Nearest-neighbor-preserving embeddings

ACM Transactions on Algorithms (TALG)
Risk Bounds for Random Regression Graphs

Foundations of Computational Mathematics
Finding the Homology of Submanifolds with High Confidence from Random Samples

Discrete & Computational Geometry
Random projection trees and low dimensional manifolds

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Online Manifold Regularization: A New Learning Setting and Empirical Study

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Which spatial partition trees are adaptive to intrinsic dimension?

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Model selection for CART regression trees

IEEE Transactions on Information Theory
Minimax-optimal classification with dyadic decision trees

IEEE Transactions on Information Theory
Rates of convergence of nearest neighbor estimation under arbitrary sampling

IEEE Transactions on Information Theory

Efficient regression in metric spaces via approximate lipschitz extension

SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of nonparametric regression, consisting of learning an arbitrary mapping f:X-Y from a data set of (x,y) pairs in which the y values are corrupted by noise of mean zero. This statistical task is known to be subject to a severe curse of dimensionality: if X@?R^D, and if the only smoothness assumption on f is that it satisfies a Lipschitz condition, it is known that any estimator based on n data points will have an error rate (risk) of @W(n^-^2^/^(^2^+^D^)). Here we present a tree-based regressor whose risk depends only on the doubling dimension of X, not on D. This notion of dimension generalizes two cases of contemporary interest: when X is a low-dimensional manifold, and when X is sparse. The tree is built using random hyperplanes as splitting criteria, building upon recent work of Dasgupta and Freund (2008) [5]; and we show that axis-parallel splits cannot achieve the same finite-sample rate of convergence.