Maximal termsets as a query structuring mechanism

  • Authors:
  • Bruno Pôssas;Nivio Ziviani;Berthier Ribeiro-Neto;Wagner Meira, Jr.

  • Affiliations:
  • Federal University of Minas, Belo Horizonte-MG, Brazil & Google Brasil, Belo Horizonte-MG, Brazil;Federal University of Minas, Belo Horizonte-MG, Brazil;Federal University of Minas, Belo Horizonte-MG, Brazil & Google Brasil, Belo Horizonte-MG, Brazil;Federal University of Minas, Belo Horizonte-MG, Brazil

  • Venue:
  • Proceedings of the 14th ACM international conference on Information and knowledge management
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Search engines process queries conjunctively to restrict the size of the answer set. Further, it is not rare to observe a mismatch between the vocabulary used in the text of Web pages and the terms used to compose the Web queries. The combination of these two features might lead to irrelevant query results, particularly in the case of more specific queries composed of three or more terms. To deal with this problem we propose a new technique for automatically structuring Web queries as a set of smaller subqueries. To select representative subqueries we use information on their distributions in the document collection. This can be adequately modeled using the concept of maximal termsets derived from the formalism of association rules theory. Experimentation shows that our technique leads to improved results. For the TREC-8 test collection, for instance, our technique led to gains in average precision of roughly 28% with regard to a BM25 ranking formula.