Clustering lines in high-dimensional space: Classification of incomplete data

Authors:
Jie Gao;Michael Langberg;Leonard J. Schulman
Affiliations:
Stony Brook University, NY;The Open University of Israel, Israel;California Institute of Technology, CA
Venue:
ACM Transactions on Algorithms (TALG)
Year:
2010

Citing 11
Cited 0

A unified approach to approximation algorithms for bottleneck problems

Journal of the ACM (JACM)
Optimal algorithms for approximate clustering

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Helly-type theorems and geometric transversals

Handbook of discrete and computational geometry
Approximation algorithms for projective clustering

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Projective clustering in high dimensions using core-sets

Proceedings of the eighteenth annual symposium on Computational geometry
A (1 + ɛ)-approximation algorithm for 2-line-center

Computational Geometry: Theory and Applications
An optimal dynamic interval stabbing-max data structure?

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Helly-type theorems for approximate covering

Proceedings of the twenty-fourth annual symposium on Computational geometry
Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem

Discrete & Computational Geometry
Contraction and Expansion of Convex Sets

Discrete & Computational Geometry

Quantified Score

Hi-index	0.01

Visualization

Abstract

A set of k balls B1, …,Bk in a Euclidean space is said to cover a collection of lines if every line intersects some ball. We consider the k-center problem for lines in high-dimensional space: Given a set of n lines l= {l1,…,ln in Rd, find k balls of minimum radius which cover l. We present a 2-approximation algorithm for the cases k = 2, 3 of this problem, having running time quasi-linear in the number of lines and the dimension of the ambient space. Our result for 3-clustering is strongly based on a new result in discrete geometry that may be of independent interest: a Helly-type theorem for collections of axis-parallel “crosses” in the plane. The family of crosses does not have finite Helly number in the usual sense. Our Helly theorem is of a new type: it depends on ε-contracting the sets. In statistical practice, data is often incompletely specified; we consider lines as the most elementary case of incompletely specified data points. Clustering of data is a key primitive in nonparametric statistics. Our results provide a way of performing this primitive on incomplete data, as well as imputing the missing values.