Topics in 0--1 data

  • Authors:
  • Ella Bingham;Heikki Mannila;Jouni K. Seppänen

  • Affiliations:
  • Helsinki University of Technology, FIN-02015 HUT, Finland;Helsinki University of Technology, FIN-02015 HUT, Finland;Helsinki University of Technology, FIN-02015 HUT, Finland

  • Venue:
  • Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large 0--1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which indicate why certain methods succeed or fail. We describe simple algorithms for finding topic models from 0--1 data. We give theoretical results showing that the algorithms can discover the epsilon-separable topic models of Papadimitriou et al. We present empirical results showing that the algorithms find natural topics in real-world data sets. We also briefly discuss the connections to matrix approaches, including nonnegative matrix factorization and independent component analysis.