Out-of-core SVD performance for document indexing

  • Authors:
  • Dian I. Martin;John C. Martin;Michael W. Berry;Murray Browne

  • Affiliations:
  • Small Bear Technical Consulting, LLC, 1458 Pawpaw Road, Thorn Hill, TN 37881, USA;Small Bear Technical Consulting, LLC, 1458 Pawpaw Road, Thorn Hill, TN 37881, USA;Department of Computer Science, University of Tennessee, Knoxville, TN 37996-3450, USA;Department of Computer Science, University of Tennessee, Knoxville, TN 37996-3450, USA

  • Venue:
  • Applied Numerical Mathematics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The following study documents a formal evaluation of the performance tradeoffs and scalability for computing the sparse matrix singular value decomposition (SVD) as part of the Latent Semantic Analysis (LSA) of a given document collection with an out-of-core process. Most software packages capable of computing the SVD do all of their processing in-core, which involves keeping all vectors for the computation in memory. This limits the size of document collections that can be processed. The goal of the study was specifically to evaluate software capable of performing the SVD calculations out-of-core, minimizing memory usage by keeping only a small set of work vectors in memory at a time. Performance measures of interest for this study included the time of execution, both in CPU time and wall clock time, as well the memory and disk usage for computing the SVD.