Optimal outlier removal in high-dimensional spaces

Authors:
John Dunagan;Santosh Vempala
Affiliations:
Department of Mathematics, MIT, Cambridge MA;Department of Mathematics, MIT, Cambridge MA
Venue:
Journal of Computer and System Sciences - STOC 2001
Year:
2004

Citing 2
Cited 4

Random walks and an O*(n5) volume algorithm for convex bodies

Random Structures & Algorithms
Optimal outlier removal in high-dimensional

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing

Learning Halfspaces with Malicious Noise

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Learning Halfspaces with Malicious Noise

The Journal of Machine Learning Research
Optimal consensus set for annulus fitting

DGCI'11 Proceedings of the 16th IAPR international conference on Discrete geometry for computer imagery
O(n 3logn) time complexity for the optimal consensus set computation for 4-Connected Digital Circles

DGCI'13 Proceedings of the 17th IAPR international conference on Discrete Geometry for Computer Imagery

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of finding an outlier-free subset of a set of points (or a probability distribution) in n-dimensional Euclidean space. As in [BFKV 99], a point x is defined to be a β-outlier if there exists some direction w in which its squared distance from the mean along w is greater than β times the average squared distance from the mean along w. Our main theorem is that for any ε 0, there exists a (1 - ε) fraction of the original distribution that has no O(n/ε(b + logn/ε))-outliers, improving on the previous bound of O(n7b/ε). This is asymptotically the best possible, as shown by a matching lower bound. The theorem is constructive, and results in a 1/1-ε approximation to the following optimization problem: given a distribution µ (i.e. the ability to sample from it), and a parameter ε 0, find the minimum β for which there exists a subset of probability at least (1 - ε) with no β-outliers.