Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
WMP '00 Proceedings of the Workshop on Multiset Processing: Multiset Processing, Mathematical, Computer Science, and Molecular Computing Points of View
Identifying and Filtering Near-Duplicate Documents
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Similarity measures for tracking information flow
Proceedings of the 14th ACM international conference on Information and knowledge management
Efficient plagiarism detection for large code repositories
Software—Practice & Experience
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Capture, Management, and Utilization of Lifecycle Information for Learning Resources
IEEE Transactions on Learning Technologies
Detecting the origin of text segments efficiently
Proceedings of the 18th international conference on World wide web
Efficient overlap and content reuse detection in blogs and online news articles
Proceedings of the 18th international conference on World wide web
On Automatic Plagiarism Detection Based on n-Grams Comparison
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Capture of lifecycle information in office applications
International Journal of Technology Enhanced Learning
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Evidence finding using a collection of books
Proceedings of the 4th ACM workshop on Online books, complementary social media and crowdsourcing
Hi-index | 0.00 |
Local reuse detection is a prerequisite for a multitude of tasks ranging from document management and information retrieval to web search or plagiarism detection. Its results can be used to support authors in creating new learning resources or learners in finding existing ones by providing accurate suggestions for related documents. While the detection of local text reuse, i.e. reuse of parts of documents, is covered by various approaches, reuse detection for object-based documents has been hardly considered yet. In this paper we propose a new fingerprinting technique for local reuse detection for both text-based and object-based documents which is based on the contiguity of documents. This additional information, which is generally disregarded by existing approaches, allows the creation of shorter and more flexible fingerprints. Evaluations performed on different corpora have shown that it performs better than existing approaches while maintaining a significantly lower storage consumption.