Comparison of Overlap Detection Techniques

Authors:
Krisztián Monostori;Raphael A. Finkel;Arkady B. Zaslavsky;Gábor Hodász;Máté Pataki
Affiliations:
-;-;-;-;-
Venue:
ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Year:
2002

Citing 5
Cited 7

Programming perl

Programming perl
Building a scalable and accurate copy detection mechanism

Proceedings of the first ACM international conference on Digital libraries
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
MatchDetectReveal: finding overlapping and similar digital documents

Proceedings of the 2000 information resources management association international conference on Challenges of information technology management in the 21st century
The SCAM Approach to Copy Detection in Digital Libraries

The SCAM Approach to Copy Detection in Digital Libraries

Sentence-based natural language plagiarism detection

Journal on Educational Resources in Computing (JERIC)
Improving web information indexing and retrieval based on center block duplication detection

International Journal of Innovative Computing and Applications
Do not crawl in the DUST: Different URLs with similar text

ACM Transactions on the Web (TWEB)
Detection of simple plagiarism in computer science papers

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Multi-resolution similarity hashing

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Experiments with filtered detection of similar academic papers

AIMSA'12 Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applications
Early-Detection system for cross-language (translated) plagiarism

ICT-EurAsia'13 Proceedings of the 2013 international conference on Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Easy access to the World Wide Web has raised concerns about copyright issues and plagiarism. It is easy to copy someone else's work and submit it as someone's own. This problem has been targeted by many systems, which use very similar approaches. These approaches are compared in this paper and suggestions are made when different strategies are more applicable than others. Some alternative approaches are proposed that perform better than previously presented methods. These previous methods share two common stages: chunking of documents and selection of representative chunks. We study both stages and also propose alternatives that are better in terms of accuracy and space requirement. The applications of these methods are not limited to plagiarism detection but may target other copy-detection problems. We also propose a third stage to be applied in the comparison that uses suffix trees and suffix vectors to identify the overlapping chunks.