Using latent semantic analysis to improve access to textual information

  • Authors:
  • S. T. Dumais;G. W. Furnas;T. K. Landauer;S. Deerwester;R. Harshman

  • Affiliations:
  • Bell Communications Research;Bell Communications Research;Bell Communications Research;Univ. of Chicago, Chicago, IL;Univ. of Western Ontario

  • Venue:
  • CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
  • Year:
  • 1988

Quantified Score

Hi-index 0.02

Visualization

Abstract

This paper describes a new approach for dealing with the vocabulary problem in human-computer interaction. Most approaches to retrieving textual materials depend on a lexical match between words in users' requests and those in or assigned to database objects. Because of the tremendous diversity in the words people use to describe the same object, lexical matching methods are necessarily incomplete and imprecise [5]. The latent semantic indexing approach tries to overcome these problems by automatically organizing text objects into a semantic structure more appropriate for matching user requests. This is done by taking advantage of implicit higher-order structure in the association of terms with text objects. The particular technique used is singular-value decomposition, in which a large term by text-object matrix is decomposed into a set of about 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination. Terms and objects are represented by 50 to 150 dimensional vectors and matched against user queries in this “semantic” space. Initial tests find this completely automatic method widely applicable and a promising way to improve users' access to many kinds of textual materials, or to objects and services for which textual descriptions are available.