Approximate Kernel Clustering

  • Authors:
  • Subhash Khot;Assaf Naor

  • Affiliations:
  • -;-

  • Venue:
  • FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the kernel clustering problem we are given a large $n\times n$positive semi-definite matrix $A=(a_{ij})$ with$\sum_{i,j=1}^na_{ij}=0$ and a small $k\times k$ positivesemi-definite matrix $B=(b_{ij})$. The goal is to find a partition$S_1,\ldots,S_k$ of $\{1,\ldots n\}$ which maximizes the quantity$$\sum_{i,j=1}^k \left(\sum_{(i,j)\in S_i\timesS_j}a_{ij}\right)b_{ij}.$$We study the computational complexity of this generic clusteringproblem which originates in the theory of machine learning. Wedesign a constant factor polynomial time approximation algorithm forthis problem, answering a question posed by Song, Smola, Gretton andBorgwardt. In some cases we manage to compute the sharpapproximation threshold for this problem assuming the Unique GamesConjecture (UGC). In particular, when $B$ is the $3\times 3$identity matrix the UGC hardness threshold of this problem isexactly $\frac{16\pi}{27}$. We present and study a geometricconjecture of independent interest which we show would imply thatthe UGC threshold when $B$ is the $k\times k$ identity matrix is$\frac{8\pi}{9}\left(1-\frac{1}{k}\right)$ for every $k\ge 3$.