Aspects of the P-Norm model of information retrieval: syntactic query generation, efficiency, and theoretical properties
Efficiently Mining Maximal Frequent Itemsets
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Exploiting syntactic structure of queries in a language modeling approach to IR
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Set-based vector model: An efficient approach for correlation-based ranking
ACM Transactions on Information Systems (TOIS)
Adapting information retrieval to query contexts
Information Processing and Management: an International Journal
Capacity-constrained query formulation
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Candidate document retrieval for web-scale text reuse detection
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
A peer-to-peer architecture for information retrieval across digital library collections
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Hi-index | 0.00 |
Search engines process queries conjunctively to restrict the size of the answer set. Further, it is not rare to observe a mismatch between the vocabulary used in the text of Web pages and the terms used to compose the Web queries. The combination of these two features might lead to irrelevant query results, particularly in the case of more specific queries composed of three or more terms. To deal with this problem we propose a new technique for automatically structuring Web queries as a set of smaller subqueries. To select representative subqueries we use information on their distributions in the document collection. This can be adequately modeled using the concept of maximal termsets derived from the formalism of association rules theory. Experimentation shows that our technique leads to improved results. For the TREC-8 test collection, for instance, our technique led to gains in average precision of roughly 28% with regard to a BM25 ranking formula.