Information Theoretic Clustering of Sparse Co-Occurrence Data

  • Authors:
  • Inderjit S. Dhillon;Yuqiang Guan

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

A novel approach to clustering co-occurrence data posesit as an optimization problem in information theory whichminimizes the resulting loss in mutual information. A divisiveclustering algorithm that monotonically reduces thisloss function was recently proposed. In this paper we showthat sparse high-dimensional data presents special challengeswhich can result in the algorithm getting stuck atpoor local minima. We propose two solutions to this problem:(a) a "prior" to overcome infinite relative entropy valuesas in the supervised Naive Bayes algorithm, and (b)local search to escape local minima. Finally, we combinethese solutions to get a robust algorithm that is computationallyefficient. We present experimental results to showthat the proposed method is effective in clustering documentcollections and outperforms previous information-theoreticclustering approaches.