Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Identifying and Filtering Near-Duplicate Documents
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Similarity measures for tracking information flow
Proceedings of the 14th ACM international conference on Information and knowledge management
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Detecting the origin of text segments efficiently
Proceedings of the 18th international conference on World wide web
Detecting text reuse with modified and weighted n-grams
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Hi-index | 0.00 |
Text reuse detection aims to identify duplicates, reformulations or partial rewrites of a given text. Some previous research has focused on determining text reuse instances accurately on local corpora. However, the practical usage of finding text reuse on the web has remained largely untested. In this work, we 1) introduce a novel text reuse searching interface for the web, based on a previously proposed architecture, 2) evaluate its feasibility, and 3) investigate techniques to improve both effectiveness and efficiency. Our results show that exhaustive query submission using n-grams can dramatically reduce the execution time with only small losses in accuracy.