Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Computational Methods for Intelligent Information Access
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Understanding search engines: mathematical modeling and text retrieval
Understanding search engines: mathematical modeling and text retrieval
On Updating Problems in Latent Semantic Indexing
SIAM Journal on Scientific Computing
Matrices with Low-Rank-Plus-Shift Structure: Partial SVD and Latent Semantic Indexing
SIAM Journal on Matrix Analysis and Applications
Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Dynamic normal forms and dynamic characteristic polynomial
Theoretical Computer Science
Hi-index | 0.00 |
Latent Semantic Indexing (LSI) is an information retrieval (IR) method that connects IR with numerical linear algebra by representing a dataset as a term-document matrix. Because of the tremendous size of modern databases, such matrices can be extremely large. The partial singular value decomposition (PSVD) is a matrix factorization that captures the salient features of a matrix while using much less storage. We look at two challenges posed by this PSVD data compression process in LSI. First we note that traditional methods of computing the PSVD are very expensive; most of the processing time in LSI is spent in calculating the PSVD of the term-document matrix. In a rapidly expanding environment such as the Internet, the term-document matrix is altered often as new documents and terms are added. Updating the PSVD of this matrix is much more efficient than recalculating it after each change. Thus, the first challenge is efficiently updating the PSVD when the matrix is altered slightly. The second challenge is calculating the PSVD efficiently in terms of computational and memory requirements. We investigate the use of the PSVD updating methods proposed by Zha and Simon [H. Zha, H.D. Simon, On updating problems in latent semantic indexing, SIAM J. Sci. Comput. 21 (2) (1999) 782-791] to meet both of these challenges. Results are presented illustrating that updating in this manner provides substantial savings in computation time, with no significant reduction in accuracy. An algorithm for iteratively computing the PSVD of a matrix using the document updating method is also presented. This iterative method, suggested by Zha and Zhang [H. Zha, Z. Zhang, Matrices with low-rank-plus-shift structure: partial SVD and latent semantic indexing, SIAM J. Matrix Anal. Appl. 21 (2) (1999) 522-536], provides a means of calculating the PSVD for matrices so large that the computation would be infeasible using traditional methods. Again, results are given showing that this method can provide savings in memory resources and computational time without compromising the accuracy of the results.