Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Using Noun Phrase Heads to Extract Document Keyphrases
AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Constructing Web search queries from the user's information need expressed in a natural language
Proceedings of the 2003 ACM symposium on Applied computing
Maximal termsets as a query structuring mechanism
Proceedings of the 14th ACM international conference on Information and knowledge management
Introducing the user-over-ranking hypothesis
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Candidate document retrieval for web-scale text reuse detection
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
From keywords to keyqueries: content descriptors for the web
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Given a set of keyphrases, we analyze how Web queries with these phrases can be formed that, taken altogether, return a specified number of hits. The use case of this problem is a plagiarism detection system that searches the Web for potentially plagiarized passages in a given suspicious document. For the query formulation problem we develop a heuristic search strategy based on co-occurrence probabilities. Compared to the maximal termset strategy [3], which can be considered as the most sensible non-heuristic baseline, our expected savings are on average 50% when queries for 9 or 10 phrases are to be constructed.