Enhancing the Set-Based Model Using Proximity Information

  • Authors:
  • Bruno Pôssas;Nivio Ziviani;Wagner Meira, Jr.

  • Affiliations:
  • -;-;-

  • Venue:
  • SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

(SBM), which is an effective technique for computing term weights based on co-occurrence patterns, employing the information about the proximity among query terms in documents. The intuition that semantically related term occurrences often occur closer to each other is taken into consideration, leading to a new information retrieval model called proximity set-based model (PSBM). The novelty is that the proximity information is used as a pruning strategy to determine only related co-occurrence term patterns. This technique is time efficient and yet yields nice improvements in retrieval effectiveness. Experimental results show that PSBM improves the average precision of the answer set for all four collections evaluated. For the CFC collection, PSBM leads to a gain relative to the standard vector space model (VSM), of 23% in average precision values and 55% in average precision for the top 10 documents. PSBM is also competitive in terms of computational performance, reducing the execution time of the SBM in 21% for the CISI collection.