Matrix analysis
Random walks and an O*(n5) volume algorithm for convex bodies
Random Structures & Algorithms
Semi-definite relaxations for minimum bandwidth and other vertex-ordering problems
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A polynomial-time algorithm for learning noisy linear threshold functions
FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
Proceedings of the nineteenth annual symposium on Computational geometry
Optimal outlier removal in high-dimensional spaces
Journal of Computer and System Sciences - STOC 2001
Clustering via minimum volume ellipsoids
Computational Optimization and Applications
A discriminative model for semi-supervised learning
Journal of the ACM (JACM)
A PAC-Style model for learning from labeled and unlabeled data
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Hi-index | 0.00 |
We study the problem of finding an outlier-free subset of a set of points (or a probability distribution) in n-dimensional Euclidean space. A point x is defined to be a &bgr;-outlier if there exists some direction w in which its squared distance from the mean along w is greater than &bgr; times the average squared distance from the mean along w [1]. Our main theorem is that for any &egr;0, there exists a (1-&egr;) fraction of the original distribution that has no O(\frac{n}{&egr;}(b+log \frac{n}{&egr;))-outliers, improving on the previous bound of O(n^7b/&egr;). This bound is shown to be nearly the best possible. The theorem is constructive, and results in a \frac{1}{1-&egr;} approximation to the following optimization problem: given a distribution &mgr; (i.e. the ability to sample from it), and a parameter &egr;0, find the minimum &bgr; for which there exists a subset of probability at least (1-&egr;) with no &bgr;-outliers.