Algorithms for clustering data
Algorithms for clustering data
A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On Updating Problems in Latent Semantic Indexing
SIAM Journal on Scientific Computing
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Test Data Likelihood for PLSA Models
Information Retrieval
Using Incremental PLSI for Threshold-Resilient Online Event Analysis
IEEE Transactions on Knowledge and Data Engineering
Bayesian Folding-In with Dirichlet Kernels for PLSI
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Incremental probabilistic latent semantic analysis for automatic question recommendation
Proceedings of the 2008 ACM conference on Recommender systems
Hi-index | 0.00 |
A novel updating method for Probabilistic Latent Semantic Analysis (PLSA), called Recursive PLSA (RPLSA), is proposed. The updating of conditional probabilities is derived from first principles for both the asymmetric and the symmetric PLSA formulations. The performance of RPLSA for both formulations is compared to that of the PLSA folding-in, the PLSA rerun from the breakpoint, and well-known LSA updating methods, such as the singular value decomposition (SVD) folding-in and the SVD-updating. The experimental results demonstrate that the RPLSA outperforms the other updating methods under study with respect to the maximization of the average log-likelihood and the minimization of the average absolute error between the probabilities estimated by the updating methods and those derived by applying the non-adaptive PLSA from scratch. A comparison in terms of CPU run time is conducted as well. Finally, in document clustering using the Adjusted Rand index, it is demonstrated that the clusters generated by the RPLSA are: (a) similar to those generated by the PLSA applied from scratch; (b) closer to the ground truth than those created by the other PLSA or LSA updating methods.