Set-based model: a new approach for information retrieval

  • Authors:
  • Bruno Pôssas;Nivio Ziviani;Wagner Meira, Jr.;Berthier Ribeiro-Neto

  • Affiliations:
  • Universidade Federal de Minas Gerais, Belo Horizonte-MG, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte-MG, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte-MG, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte-MG, Brazil

  • Venue:
  • SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The objective of this paper is to present a new technique for computing term weights for index terms, which leads to a new ranking mechanism, referred to as set-based model. The components in our model are no longer terms, but termsets. The novelty is that we compute term weights using a data mining technique called association rules, which is time efficient and yet yields nice improvements in retrieval effectiveness. The set-based model function for computing the similarity between a document and a query considers the termset frequency in the document and its scarcity in the document collection. Experimental results show that our model improves the average precision of the answer set for all three collections evaluated. For the TReC-3 collection, our set-based model led to a gain, relative to the standard vector space model, of 37% in average precision curves and of 57% in average precision for the top 10 documents. Like the vector space model, the set-based model has time complexity that is linear in the number of documents in the collection.