Pairwise clustering with t-PLSI

Authors:
He Zhang;Tele Hao;Zhirong Yang;Erkki Oja
Affiliations:
Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland;Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland;Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland;Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland
Venue:
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Year:
2012

Citing 7
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing

Computational Statistics & Data Analysis
Non-negative matrix factorization with α-divergence

Pattern Recognition Letters
Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis

Neural Computation
Convex and Semi-Nonnegative Matrix Factorizations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation

Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation
Unified Development of Multiplicative Algorithms for Linear and Quadratic Nonnegative Matrix Factorization

IEEE Transactions on Neural Networks - Part 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the past decade, Probabilistic Latent Semantic Indexing (PLSI) has become an important modeling technique, widely used in clustering or graph partitioning analysis. However, the original PLSI is designed for multinomial data and may not handle other data types. To overcome this restriction, we generalize PLSI to t-exponential family based on a recently proposed information criterion called t-divergence. The t-divergence enjoys more flexibility than KL-divergence in PLSI such that it can accommodate more types of noise in data. To optimize the generalized learning objective, we propose a Majorization-Minimization algorithm which multiplicatively updates the factorizing matrices. The new method is verified in pairwise clustering tasks. Experimental results on real-world datasets show that PLSI with t-divergence can improve clustering performance in purity for certain datasets.