Knowledge-Enhanced Latent Semantic Indexing

  • Authors:
  • David Guo;Michael W. Berry;Bryan B. Thompson;Sidney Bailin

  • Affiliations:
  • Department of Computer Science, 203 Claxton Complex, University of Tennessee, Knoxville, TN, 37996-3450, USA. dguo@cs.utk.edu;Department of Computer Science, 203 Claxton Complex, University of Tennessee, Knoxville, TN, 37996-3450, USA. berry@cs.utk.edu;Global Wisdom, Inc., 1737 Harvard Street, NW, Washington, DC 20009, USA. bryan@globalwisdom.org;Knowledge Evolution, Inc., 1050 17th Street, NW, Suite 520, Washington, DC 20036, USA. sbailin@kevol.com

  • Venue:
  • Information Retrieval
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Latent Semantic Indexing (LSI) is a popular information retrieval model for concept-based searching. As with many vector space IR models, LSI requires an existing term-document association structure such as a term-by-document matrix. The term-by-document matrix, constructed during document parsing, can only capture weighted vocabulary occurrence patterns in the documents. However, for many knowledge domains there are pre-existing semantic structures that could be used to organize and categorize information. The goals of this study are (i) to demonstrate how such semantic structures can be automatically incorporated into the LSI vector space model, and (ii) to measure the effect of these structures on query matching performance. The new approach, referred to as Knowledge-Enhanced LSI, is applied to documents in the OHSUMED medical abstracts collection using the semantic structures provided by the UMLS Semantic Network and MeSH. Results based on precision-recall data (11-point average precision values) indicate that a MeSH-enhanced search index is capable of delivering noticeable incremental performance gain (as much as 35%) over the original LSI for modest constraints on precision. This performance gain is achieved by replacing the original query with the MeSH heading extracted from the query text via regular expression matches.