Out-of-core SVD performance for document indexing

Authors:
Dian I. Martin;John C. Martin;Michael W. Berry;Murray Browne
Affiliations:
Small Bear Technical Consulting, LLC, 1458 Pawpaw Road, Thorn Hill, TN 37881, USA;Small Bear Technical Consulting, LLC, 1458 Pawpaw Road, Thorn Hill, TN 37881, USA;Department of Computer Science, University of Tennessee, Knoxville, TN 37996-3450, USA;Department of Computer Science, University of Tennessee, Knoxville, TN 37996-3450, USA
Venue:
Applied Numerical Mathematics
Year:
2007

Citing 9
Cited 2

The vocabulary problem in human-system communication

Communications of the ACM
Sparse matrix test problems

ACM Transactions on Mathematical Software (TOMS)
Automatic text structuring and retrieval-experiments in automatic encyclopedia searching

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Using linear algebra for intelligent information retrieval

SIAM Review
Large-scale information retrieval with latent semantic indexing

Information Sciences: an International Journal
The symmetric eigenvalue problem

The symmetric eigenvalue problem
Matrices, Vector Spaces, and Information Retrieval

SIAM Review
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Understanding Search Engines: Mathematical Modeling and Text Retrieval (Software, Environments, Tools), Second Edition

Understanding Search Engines: Mathematical Modeling and Text Retrieval (Software, Environments, Tools), Second Edition

Distributed, large-scale latent semantic analysis by index interpolation

Proceedings of the 3rd international conference on Scalable information systems
Visualization of temporal text collections based on Correspondence Analysis

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The following study documents a formal evaluation of the performance tradeoffs and scalability for computing the sparse matrix singular value decomposition (SVD) as part of the Latent Semantic Analysis (LSA) of a given document collection with an out-of-core process. Most software packages capable of computing the SVD do all of their processing in-core, which involves keeping all vectors for the computation in memory. This limits the size of document collections that can be processed. The goal of the study was specifically to evaluate software capable of performing the SVD calculations out-of-core, minimizing memory usage by keeping only a small set of work vectors in memory at a time. Performance measures of interest for this study included the time of execution, both in CPU time and wall clock time, as well the memory and disk usage for computing the SVD.