Sequential information bottleneck for finite data

  • Authors:
  • Jaakko Peltonen;Janne Sinkkonen;Samuel Kaski

  • Affiliations:
  • Helsinki University of Technology, Finland;Helsinki University of Technology, Finland;Helsinki University of Technology, Finland

  • Venue:
  • ICML '04 Proceedings of the twenty-first international conference on Machine learning
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

The sequential information bottleneck (sIB) algorithm clusters co-occurrence data such as text documents vs. words. We introduce a variant that models sparse co-occurrence data by a generative process. This turns the objective function of sIB, mutual information, into a Bayes factor, while keeping it intact asymptotically, for non-sparse data. Experimental performance of the new algorithm is comparable to the original sIB for large data sets, and better for smaller, sparse sets.