Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Improving text retrieval for the routing problem using latent semantic indexing
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Representing documents using an explicit model of their similarities
Journal of the American Society for Information Science
Matrix computations (3rd ed.)
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A similarity-based probability model for latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Large-Scale SVD and Subspace-Based Methods for Information Retrieval
IRREGULAR '98 Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel
On the use of the singular value decomposition for text retrieval
Computational information retrieval
Automatic word sense discrimination
Computational Linguistics - Special issue on word sense disambiguation
A probabilistic model for Latent Semantic Indexing: Research Articles
Journal of the American Society for Information Science and Technology
An analysis of latent semantic term self-correlation
ACM Transactions on Information Systems (TOIS)
Adapting Spectral Co-clustering to Documents and Terms Using Latent Semantic Analysis
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Understanding latent semantic indexing: A topological structure analysis using Q-analysis
Journal of the American Society for Information Science and Technology
Latent semantic indexing (LSI) fails for TREC collections
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
Latent Semantic Indexing (LSI) uses the singular value decomposition to reduce noisy dimensions and improve the performance of text retrieval systems. Preliminary results have shown modest improvements in retrieval accuracy and recall, but these have mainly explored small collections. In this paper we investigate text retrieval on a larger document collection (TREC) and focus on distribution of word norm (magnitude). Our results indicate the inadequacy of word representations in LSI space on large collections. We emphasize the query expansion interpretation of LSI and propose an LSI term normalization that achieves better performance on larger collections.