Understanding and enhancing the folding-in method in latent semantic indexing

  • Authors:
  • Xiang Wang;Xiaoming Jin

  • Affiliations:
  • School of Software, Tsinghua University, Beijing, China;School of Software, Tsinghua University, Beijing, China

  • Venue:
  • DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Latent Semantic Indexing(LSI) has been proved to be effective to capture the semantic structure of document collections. It is widely used in content-based text retrieval. However, in many real-world applications dealing with very large document collections, LSI suffers from its high computational complexity, which comes from the process of Singular Value Decomposition(SVD). As a result, in practice, the folding-in method is widely used as an approximation to the LSI method. However, in practice, the folding-in method is generally implemented ”as is” and detailed analysis on its effectiveness and performance is left out. Consequentially, the performance of the folding-in method cannot be guaranteed. In this paper, we firstly illustrated the underlying principle of the folding-in method from a linear algebra point of view and analyzed some existing commonly used techniques. Based on the theoretical analysis, we proposed a novel algorithm to guide the implementation of the folding-in method. Our method was justified and evaluated by a series of experiments on various classical IR data sets. The results indicated that our method was effective and had consistent performance over different document collections.