Latent concepts and the number orthogonal factors in latent semantic analysis

Authors:
Georges Dupret
Affiliations:
-
Venue:
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Year:
2003

Citing 7
Cited 7

Latent variable models and factors analysis

Latent variable models and factors analysis
Automatic structuring and retrieval of large text files

Communications of the ACM
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using Linear Algebra for Intelligent Information Retrieval

Using Linear Algebra for Intelligent Information Retrieval

A probabilistic model for Latent Semantic Indexing: Research Articles

Journal of the American Society for Information Science and Technology
Why spectral retrieval works

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Model-averaged latent semantic indexing

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Unified linear subspace approach to semantic analysis

Journal of the American Society for Information Science and Technology
Understanding latent semantic indexing: A topological structure analysis using Q-analysis

Journal of the American Society for Information Science and Technology
Principal components for automatic term hierarchy building

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Discovering a term taxonomy from term similarities using principal component analysis

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

We seek insight into Latent Semantic Indexing by establishing a method to identify the optimal number of factors in the reduced matrix for representing a keyword. This method is demonstrated empirically by duplicating all documents containing a term t, and inserting new documents in the database that replace t with t'. By examining the number of times term t is identified for a search on term t' (precision) using differing ranges of dimensions, we find that lower ranked dimensions identify related terms and higher-ranked dimensions discriminate between the synonyms.