Similarity-based clustering by left-stochastic matrix factorization

  • Authors:
  • Raman Arora;Maya R. Gupta;Amol Kapila;Maryam Fazel

  • Affiliations:
  • Toyota Technological Institute, Chicago, IL;Google, Mountain View, CA;Department of Electrical Engineering, University of Washington, Seattle, WA;Department of Electrical Engineering, University of Washington, Seattle, WA

  • Venue:
  • The Journal of Machine Learning Research
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

For similarity-based clustering, we propose modeling the entries of a given similarity matrix as the inner products of the unknown cluster probabilities. To estimate the cluster probabilities from the given similarity matrix, we introduce a left-stochastic non-negative matrix factorization problem. A rotation-based algorithm is proposed for the matrix factorization. Conditions for unique matrix factorizations and clusterings are given, and an error bound is provided. The algorithm is particularly efficient for the case of two clusters, which motivates a hierarchical variant for cases where the number of desired clusters is large. Experiments show that the proposed left-stochastic decomposition clustering model produces relatively high within-cluster similarity on most data sets and can match given class labels, and that the efficient hierarchical variant performs surprisingly well.