On the equivalence of PLSI and projected clustering

  • Authors:
  • Charu C. Aggarwal

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY

  • Venue:
  • ACM SIGMOD Record
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of projected clustering was first proposed in the ACMSIGMOD Conference in 1999, and the Probabilistic Latent Semantic Indexing (PLSI) technique was independently proposed in the ACMSIGIR Conference in the same year. Since then, more than two thousand papers have been written on these problems by the database, data mining and information retrieval communities, along completely independent lines of work. In this paper, we show that these two problems are essentially equivalent, under a probabilistic interpretation to the projected clustering problem. We will show that the EM-algorithm, when applied to the probabilistic version of the projected clustering problem, can be almost identically interpreted as the PLSI technique. The implications of this equivalence are significant, in that they imply the cross-usability of many of the techniques which have been developed for these problems over the last decade. We hope that our observations about the equivalence of these problems will stimulate further research which can significantly improve the currently available solutions for either of these problems.