Unified linear subspace approach to semantic analysis

  • Authors:
  • Dandan Li;Chung-Ping Kwong;Dik Lun Lee

  • Affiliations:
  • Department of Computer Science and Engineering, Hong Kong University of Science & Technology, Hong Kong;Department of Mechanical and Automation Engineering, Chinese University of Hong Kong, Hong Kong;Department of Computer Science and Engineering, Hong Kong University of Science & Technology, Hong Kong

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Basic Vector Space Model (BVSM) is well known in information retrieval. Unfortunately, its retrieval effectiveness is limited because it is based on literal term matching. The Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) are two prominent semantic retrieval methods, both of which assume there is some underlying latent semantic structure in a dataset that can be used to improve retrieval performance. However, while this structure may be derived from both the term space and the document space, GVSM exploits only the former and LSI the latter. In this article, the latent semantic structure of a dataset is examined from a dual perspective; namely, we consider the term space and the document space simultaneously. This new viewpoint has a natural connection to the notion of kernels. Specifically, a unified kernel function can be derived for a class of vector space models. The dual perspective provides a deeper understanding of the semantic space and makes transparent the geometrical meaning of the unified kernel function. New semantic analysis methods based on the unified kernel function are developed, which combine the advantages of LSI and GVSM. We also prove that the new methods are stable because although the selected rank of the truncated Singular Value Decomposition (SVD) is far from the optimum, the retrieval performance will not be degraded significantly. Experiments performed on standard test collections show that our methods are promising. © 2010 Wiley Periodicals, Inc.