Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Fusion Via a Linear Combination of Scores
Information Retrieval
Approximate Dimension Equalization in Vector-based Information Retrieval
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Term norm distribution and its effects on latent semantic indexing
Information Processing and Management: an International Journal
Regularizing ad hoc retrieval scores
Proceedings of the 14th ACM international conference on Information and knowledge management
Essential Dimensions of Latent Semantic Indexing (LSI)
HICSS '07 Proceedings of the 40th Annual Hawaii International Conference on System Sciences
Score standardization for inter-collection comparison of retrieval systems
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
An empirical study of required dimensionality for large-scale latent semantic indexing applications
Proceedings of the 17th ACM conference on Information and knowledge management
Improvements that don't add up: ad-hoc retrieval results since 1998
Proceedings of the 18th ACM conference on Information and knowledge management
Regularized Latent Semantic Indexing: A New Approach to Large-Scale Topic Modeling
ACM Transactions on Information Systems (TOIS)
Cross-lingual random indexing for information retrieval
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
The aim of latent semantic indexing (LSI) is to uncover the relationships between terms, hidden concepts, and documents. LSI uses the matrix factorization technique known as singular value decomposition (SVD). In this paper, we apply LSI to standard benchmark collections. We find that LSI yields poor retrieval accuracy on the TREC 2, 7, 8, and 2004 collections. We believe that the negative result is robust, because we try more LSI variants than any previous work. First, we show that using Okapi BM25 weights for terms in documents improves the performance of LSI. Second, we derive novel scoring methods that implement the ideas of query expansion and score regularization in the LSI framework. Third, we show how to combine the BM25 method with LSI methods. All proposed methods are evaluated experimentally on the four TREC collections mentioned above. The experiments show that the new variants of LSI improve upon previous LSI methods. Nevertheless, no way of using LSI achieves a worthwhile improvement in retrieval accuracy over BM25.