Optimal outlier removal in high-dimensional

Authors:
John Dunagan;Santosh Vempala
Affiliations:
Department of Mathematics, MIT, Cambridge, MA;Department of Mathematics, MIT, Cambridge, MA
Venue:
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Year:
2001

Citing 4
Cited 5

Matrix analysis

Matrix analysis
Random walks and an O*(n5) volume algorithm for convex bodies

Random Structures & Algorithms
Semi-definite relaxations for minimum bandwidth and other vertex-ordering problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A polynomial-time algorithm for learning noisy linear threshold functions

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science

Shape fitting with outliers

Proceedings of the nineteenth annual symposium on Computational geometry
Optimal outlier removal in high-dimensional spaces

Journal of Computer and System Sciences - STOC 2001
Clustering via minimum volume ellipsoids

Computational Optimization and Applications
A discriminative model for semi-supervised learning

Journal of the ACM (JACM)
A PAC-Style model for learning from labeled and unlabeled data

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of finding an outlier-free subset of a set of points (or a probability distribution) in n-dimensional Euclidean space. A point x is defined to be a &bgr;-outlier if there exists some direction w in which its squared distance from the mean along w is greater than &bgr; times the average squared distance from the mean along w [1]. Our main theorem is that for any &egr;0, there exists a (1-&egr;) fraction of the original distribution that has no O(\frac{n}{&egr;}(b+log \frac{n}{&egr;))-outliers, improving on the previous bound of O(n^7b/&egr;). This bound is shown to be nearly the best possible. The theorem is constructive, and results in a \frac{1}{1-&egr;} approximation to the following optimization problem: given a distribution &mgr; (i.e. the ability to sample from it), and a parameter &egr;0, find the minimum &bgr; for which there exists a subset of probability at least (1-&egr;) with no &bgr;-outliers.