Sequential information bottleneck for finite data

Authors:
Jaakko Peltonen;Janne Sinkkonen;Samuel Kaski
Affiliations:
Helsinki University of Technology, Finland;Helsinki University of Technology, Finland;Helsinki University of Technology, Finland
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 6
Cited 3

Unsupervised document classification using sequential information maximization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Variational Extensions to EM and Multinomial PCA

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Discriminative Clustering: Optimal Contingency Tables by Learning Metrics

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Statistical Models for Co-occurrence Data

Statistical Models for Co-occurrence Data
Latent dirichlet allocation

The Journal of Machine Learning Research
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Associative Clustering for Exploring Dependencies between Functional Genomics Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multi-way distributional clustering via pairwise interactions

ICML '05 Proceedings of the 22nd international conference on Machine learning
Iterative sIB algorithm

Pattern Recognition Letters

Quantified Score

Hi-index	0.01

Visualization

Abstract

The sequential information bottleneck (sIB) algorithm clusters co-occurrence data such as text documents vs. words. We introduce a variant that models sparse co-occurrence data by a generative process. This turns the objective function of sIB, mutual information, into a Bayes factor, while keeping it intact asymptotically, for non-sparse data. Experimental performance of the new algorithm is comparable to the original sIB for large data sets, and better for smaller, sparse sets.