An effective approach to document retrieval via utilizing WordNet and recognizing phrases

  • Authors:
  • Shuang Liu;Fang Liu;Clement Yu;Weiyi Meng

  • Affiliations:
  • University of Illinois at Chicago, Chicago, IL;University of Illinois at Chicago, Chicago, IL;University of Illinois at Chicago, Chicago, IL;Binghamton University, Binghamton, NY

  • Venue:
  • Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Noun phrases in queries are identified and classified into four types: proper names, dictionary phrases, simple phrases and complex phrases. A document has a phrase if all content words in the phrase are within a window of a certain size. The window sizes for different types of phrases are different and are determined using a decision tree. Phrases are more important than individual terms. Consequently, documents in response to a query are ranked with matching phrases given a higher priority. We utilize WordNet to disambiguate word senses of query terms. Whenever the sense of a query term is determined, its synonyms, hyponyms, words from its definition and its compound words are considered for possible additions to the query. Experimental results show that our approach yields between 23% and 31% improvements over the best-known results on the TREC 9, 10 and 12 collections for short (title only) queries, without using Web data.