Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Topic-based document segmentation with probabilistic latent semantic analysis
Proceedings of the eleventh international conference on Information and knowledge management
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Inducing a semantically annotated lexicon via EM-based clustering
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Topic analysis using a finite mixture model
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Modeling Semantic Aspects for Cross-Media Image Indexing
IEEE Transactions on Pattern Analysis and Machine Intelligence
Scene modeling in global-local view for scene classification
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Fusing semantic aspects for image annotation and retrieval
Journal of Visual Communication and Image Representation
Modeling continuous visual features for semantic image annotation and retrieval
Pattern Recognition Letters
RPLSA: A novel updating scheme for Probabilistic Latent Semantic Analysis
Computer Speech and Language
Applying latent dirichlet allocation to automatic essay grading
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Use of contexts in language model interpolation and adaptation
Computer Speech and Language
Hi-index | 0.00 |
Probabilistic Latent Semantic Analysis (PLSA) is a statistical latent class model that has recently received considerable attention. In its usual formulation it cannot assign likelihoods to unseen documents. Furthermore, it assigns a probability of zero to unseen documents during training. We point out that one of the two existing alternative formulations of the Expectation-Maximization algorithms for PLSA does not require this assumption. However, even that formulation does not allow calculation ofthe actual likelihood values. We therefore derive a new test-data likelihood substitute for PLSA and compare it to three existing likelihood substitutes. An empirical evaluation shows that our new likelihood substitute produces the best predictions about accuracies in two different IR tasks and is therefore best suited to determine the number of EM steps when training PLSA models. The new likelihood measure and its evaluation also suggest that PLSA is not very sensitive to overfitting for the two tasks considered. This renders additions like tempered EM that especially address overfitting unnecessary.