Using latent semantic indexing for information filtering

  • Authors:
  • P. W. Foltz

  • Affiliations:
  • Box 345, Dept. of Psychology, University of Colorado, Boulder, CO

  • Venue:
  • COCS '90 Proceedings of the ACM SIGOIS and IEEE CS TC-OA conference on Office information systems
  • Year:
  • 1990

Quantified Score

Hi-index 0.00

Visualization

Abstract

Latent Semantic Indexing (LSI) is an information retrieval method that organizes information into a semantic structure that takes advantage of some of the implicit higher-order associations of words with text objects. The resulting structure reflects the major associative patterns in the data while ignoring some of the smaller variations that may be due to idiosyncrasies in the word usage of individual documents. This permits retrieval based on the “latent” semantic content of the documents rather than just on keyword matches. This paper evaluates using LSI for filtering information such as Netnews articles based on a model of user preferences for articles. Users judged articles on how interesting they were and based on these judgements, LSI predicted whether new articles would be judged interesting. LSI improved prediction performance over keyword matching an average of 13% and showed a 26% improvement in precision over presenting articles in the order received. The results indicate that user preferences for articles tend to cluster based on the semantic similarities between articles.