Topics in 0--1 data

Authors:
Ella Bingham;Heikki Mannila;Jouni K. Seppänen
Affiliations:
Helsinki University of Technology, FIN-02015 HUT, Finland;Helsinki University of Technology, FIN-02015 HUT, Finland;Helsinki University of Technology, FIN-02015 HUT, Finland
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 11
Cited 2

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Independent component analysis, a new concept?

Signal Processing - Special issue on higher order statistics
A maximum entropy approach to natural language processing

Computational Linguistics
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast discovery of association rules

Advances in knowledge discovery and data mining
Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic query models for transaction data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Practical Identifiability of Finite Mixtures of Multivariate Bernoulli Distributions

Neural Computation

Dense itemsets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Maximally informative k-itemsets and their efficient discovery

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large 0--1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which indicate why certain methods succeed or fail. We describe simple algorithms for finding topic models from 0--1 data. We give theoretical results showing that the algorithms can discover the epsilon-separable topic models of Papadimitriou et al. We present empirical results showing that the algorithms find natural topics in real-world data sets. We also briefly discuss the connections to matrix approaches, including nonnegative matrix factorization and independent component analysis.