Spectral clustering with limited independence

  • Authors:
  • Anirban Dasgupta;John Hopcroft;Ravi Kannan;Pradipta Mitra

  • Affiliations:
  • Cornell university;Cornell University;Yale University;Yale University

  • Venue:
  • SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper considers the well-studied problem of clustering a set of objects under a probabilistic model of data in which each object is represented as a vector over the set of features, and there are only k different types of objects. In general, earlier results (mixture models and "planted" problems on graphs) often assumed that all coordinates of all objects are independent random variables. They then appeal to the theory of random matrices in order to infer spectral properties of the feature x object matrix. However, in most practical applications, assuming full independence is not realistic. Instead, we only assume that the objects are independent, but the coordinates of each object may not be. We first generalize the required results for random matrices to this case of limited independence using some new techniques developed in Functional Analysis. Surprisingly, we are able to prove results that are quite similar to the fully independent case modulo an extra logarithmic factor. Using these bounds, we develop clustering algorithms for the more general mixture models. Our clustering algorithms have a substantially different and perhaps simpler "clean-up" phase than known algorithms. We show that our model subsumes not only the planted partition random graph models, but also another set of models under which there is a body of clustering algorithms, namely the Gaussian and log-concave mixture models.