SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Using Noun Phrase Heads to Extract Document Keyphrases
AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Maximal termsets as a query structuring mechanism
Proceedings of the 14th ACM international conference on Information and knowledge management
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Random sampling from a search engine's index
Journal of the ACM (JACM)
A survey of pre-retrieval query performance predictors
Proceedings of the 17th ACM conference on Information and knowledge management
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Proceedings of the Second ACM International Conference on Web Search and Data Mining
The Top Ten Algorithms in Data Mining
The Top Ten Algorithms in Data Mining
A case for improved evaluation of query difficulty prediction
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Automatic retrieval of similar content using search engine query interface
Proceedings of the 18th ACM conference on Information and knowledge management
Capacity-constrained query formulation
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Introducing the user-over-ranking hypothesis
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
The impact of spelling errors on patent search
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
From keywords to keyqueries: content descriptors for the web
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.01 |
Given a document d, the task of text reuse detection is to find those passages in d which in identical or paraphrased form also appear in other documents. To solve this problem at web-scale, keywords representing d's topics have to be combined to web queries. The retrieved web documents can then be delivered to a text reuse detection system for an in-depth analysis. We focus on the query formulation problem as the crucial first step in the detection process and present a new query formulation strategy that achieves convincing results: compared to a maximal termset query formulation strategy [10, 14], which is the most sensible non-heuristic baseline, we save on average 70% of the queries in realistic experiments. With respect to the candidate documents' quality, our heuristic retrieves documents that are, on average, more similar to the given document than the results of previously published query formulation strategies [4, 8].