Understanding latent semantic indexing: A topological structure analysis using Q-analysis

  • Authors:
  • Dandan Li;Chung-Ping Kwong

  • Affiliations:
  • Department of Computer Science and Engineering, Hong Kong University of Science & Technology, Hong Kong;Department of Mechanical and Automation Engineering, Chinese University of Hong Kong, Hong Kong

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The method of latent semantic indexing (LSI) is well-known for tackling the synonymy and polysemy problems in information retrieval; however, its performance can be very different for various datasets, and the questions of what characteristics of a dataset and why these characteristics contribute to this difference have not been fully understood. In this article, we propose that the mathematical structure of simplexes can be attached to a term-document matrix in the vector space model (VSM) for information retrieval. The Q-analysis devised by R.H. Atkin ([1974]) may then be applied to effect an analysis of the topological structure of the simplexes and their corresponding dataset. Experimental results of this analysis reveal that there is a correlation between the effectiveness of LSI and the topological structure of the dataset. By using the information obtained from the topological analysis, we develop a new method to explore the semantic information in a dataset. Experimental results show that our method can enhance the performance of VSM for datasets over which LSI is not effective. © 2010 Wiley Periodicals, Inc.