A deterministic annealing approach to clustering
Pattern Recognition Letters
Personalized information delivery: an analysis of information filtering methods
Communications of the ACM - Special issue on information filtering
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Deterministic annealing EM algorithm
Neural Networks
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Topic-based document segmentation with probabilistic latent semantic analysis
Proceedings of the eleventh international conference on Information and knowledge management
Journal of the American Society for Information Science and Technology
Guest Editorial: Computational Vision at Brown
International Journal of Computer Vision - Special Issue on Computational Vision at Brown University
Collaborative filtering via gaussian probabilistic latent semantic analysis
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
On image auto-annotation with latent space models
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Latent semantic models for collaborative filtering
ACM Transactions on Information Systems (TOIS)
Corpus structure, language models, and ad hoc information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Web usage mining based on probabilistic latent semantic analysis
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
PLSA-based image auto-annotation: constraining the latent space
Proceedings of the 12th annual ACM international conference on Multimedia
Text summarization using a trainable summarizer and latent semantic analysis
Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective
A comparative evaluation of data-driven models in translation selection of machine translation
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Information retrieval based on collaborative filtering with latent interest semantic map
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A generalization of independence in naive bayes model
IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Hi-index | 0.00 |
This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed technique uses a generative latent class model to perform a probabilistic mixture decomposition. This results in a more principled approach with a solid foundation in statistical inference. More precisely, we propose to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice. Probabilistic Latent Semantic Analysis has many applications, most prominently in information retrieval, natural language processing, machine learning from text, and in related areas. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. The experiments indicate substantial and consistent improvements of the probabilistic method over standard Latent Semantic Analysis.