How latent is latent semantic analysis?

  • Authors:
  • Peter Wiemer-Hastings

  • Affiliations:
  • University of Memphis, Department of Psychology, Memphis, TN

  • Venue:
  • IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Latent Semantic Analysis (LSA) is a statistical, corpus-based text comparison mechanism that was originally developed for the task of information retrieval, but in recent years has produced remarkably human-like abilities in a variety of language tasks. LSA has taken the Test of English as a Foreign Language and performed as well as non-native English speakers who were successful college applicants. It has shown an ability to learn words at a rate similar to humans. It has even graded papers as reliably as human graders. We have used LSA as a mechanism for evaluating the quality of student responses in an intelligent tutoring system, and its performance equals that of human raters with intermediate domain knowledge. It has been claimed that LSA's text-comparison abilities stem primarily from its use of a statistical technique called singular value decomposition (SVD) which compresses a large amount of term and document co-occurrence information into a smaller space. This compression is said to capture the semantic information that is latent in the corpus itself. We test this claim by comparing LSA to a version of LSA without SVD, as well as a simple keyword matching model.