Unsupervised document classification using sequential information maximization
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Variational Extensions to EM and Multinomial PCA
ECML '02 Proceedings of the 13th European Conference on Machine Learning
Discriminative Clustering: Optimal Contingency Tables by Learning Metrics
ECML '02 Proceedings of the 13th European Conference on Machine Learning
Statistical Models for Co-occurrence Data
Statistical Models for Co-occurrence Data
The Journal of Machine Learning Research
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Associative Clustering for Exploring Dependencies between Functional Genomics Data Sets
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multi-way distributional clustering via pairwise interactions
ICML '05 Proceedings of the 22nd international conference on Machine learning
Pattern Recognition Letters
Hi-index | 0.01 |
The sequential information bottleneck (sIB) algorithm clusters co-occurrence data such as text documents vs. words. We introduce a variant that models sparse co-occurrence data by a generative process. This turns the objective function of sIB, mutual information, into a Bayes factor, while keeping it intact asymptotically, for non-sparse data. Experimental performance of the new algorithm is comparable to the original sIB for large data sets, and better for smaller, sparse sets.