Unsupervised Learning by Probabilistic Latent Semantic Analysis

Authors:
Thomas Hofmann
Affiliations:
Department of Computer Science, Brown University, Providence, RI 02912, USA. th@cs.brown.edu
Venue:
Machine Learning
Year:
2001

Citing 10
Cited 13

A deterministic annealing approach to clustering

Pattern Recognition Letters
Personalized information delivery: an analysis of information filtering methods

Communications of the ACM - Special issue on information filtering
Using linear algebra for intelligent information retrieval

SIAM Review
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Deterministic annealing EM algorithm

Neural Networks
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Learning from dyadic data

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics

Topic-based document segmentation with probabilistic latent semantic analysis

Proceedings of the eleventh international conference on Information and knowledge management
Letters to the editor

Journal of the American Society for Information Science and Technology
Guest Editorial: Computational Vision at Brown

International Journal of Computer Vision - Special Issue on Computational Vision at Brown University
Collaborative filtering via gaussian probabilistic latent semantic analysis

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
On image auto-annotation with latent space models

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Latent semantic models for collaborative filtering

ACM Transactions on Information Systems (TOIS)
Corpus structure, language models, and ad hoc information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Web usage mining based on probabilistic latent semantic analysis

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
PLSA-based image auto-annotation: constraining the latent space

Proceedings of the 12th annual ACM international conference on Multimedia
Text summarization using a trainable summarizer and latent semantic analysis

Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective
A comparative evaluation of data-driven models in translation selection of machine translation

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Information retrieval based on collaborative filtering with latent interest semantic map

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A generalization of independence in naive bayes model

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed technique uses a generative latent class model to perform a probabilistic mixture decomposition. This results in a more principled approach with a solid foundation in statistical inference. More precisely, we propose to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice. Probabilistic Latent Semantic Analysis has many applications, most prominently in information retrieval, natural language processing, machine learning from text, and in related areas. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. The experiments indicate substantial and consistent improvements of the probabilistic method over standard Latent Semantic Analysis.