Maximal termsets as a query structuring mechanism

Authors:
Bruno Pôssas;Nivio Ziviani;Berthier Ribeiro-Neto;Wagner Meira, Jr.
Affiliations:
Federal University of Minas, Belo Horizonte-MG, Brazil & Google Brasil, Belo Horizonte-MG, Brazil;Federal University of Minas, Belo Horizonte-MG, Brazil;Federal University of Minas, Belo Horizonte-MG, Brazil & Google Brasil, Belo Horizonte-MG, Brazil;Federal University of Minas, Belo Horizonte-MG, Brazil
Venue:
Proceedings of the 14th ACM international conference on Information and knowledge management
Year:
2005

Citing 4
Cited 4

Aspects of the P-Norm model of information retrieval: syntactic query generation, efficiency, and theoretical properties

Aspects of the P-Norm model of information retrieval: syntactic query generation, efficiency, and theoretical properties
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Exploiting syntactic structure of queries in a language modeling approach to IR

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Set-based vector model: An efficient approach for correlation-based ranking

ACM Transactions on Information Systems (TOIS)

Adapting information retrieval to query contexts

Information Processing and Management: an International Journal
Capacity-constrained query formulation

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Candidate document retrieval for web-scale text reuse detection

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
A peer-to-peer architecture for information retrieval across digital library collections

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search engines process queries conjunctively to restrict the size of the answer set. Further, it is not rare to observe a mismatch between the vocabulary used in the text of Web pages and the terms used to compose the Web queries. The combination of these two features might lead to irrelevant query results, particularly in the case of more specific queries composed of three or more terms. To deal with this problem we propose a new technique for automatically structuring Web queries as a set of smaller subqueries. To select representative subqueries we use information on their distributions in the document collection. This can be adequately modeled using the concept of maximal termsets derived from the formalism of association rules theory. Experimentation shows that our technique leads to improved results. For the TREC-8 test collection, for instance, our technique led to gains in average precision of roughly 28% with regard to a BM25 ranking formula.