Signature extraction for overlap detection in documents

Authors:
Raphael A. Finkel;Arkady Zaslavsky;Krisztián Monostori;Heinz Schmidt
Affiliations:
University of Kentucky, Lexington, KY;Monash University, Melbourne, Australia;Monash University, Melbourne, Australia;Monash University, Melbourne, Australia
Venue:
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Year:
2002

Citing 6
Cited 12

Programming perl

Programming perl
Building a scalable and accurate copy detection mechanism

Proceedings of the first ACM international conference on Digital libraries
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Document overlap detection system for distributed digital libraries

DL '00 Proceedings of the fifth ACM conference on Digital libraries
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)

Plagiarism detection of text using knowledge-based techniques

Design and application of hybrid intelligent systems
Finding similar files in large document repositories

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Sentence-based natural language plagiarism detection

Journal on Educational Resources in Computing (JERIC)
Managing duplicates in a web archive

Proceedings of the 2006 ACM symposium on Applied computing
Generating links by mining quotations

Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Do not crawl in the DUST: Different URLs with similar text

ACM Transactions on the Web (TWEB)
A coarse-to-fine framework to efficiently thwart plagiarism

Pattern Recognition
An evolutionary neural network approach to intrinsic plagiarism detection

AICS'09 Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science
Intrinsic plagiarism analysis

Language Resources and Evaluation
Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence

Proceedings of the 11th ACM symposium on Document engineering
PPChecker: plagiarism pattern checker in document copy detection

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Research on intrinsic plagiarism detection resolution: a supervised learning approach

CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Easy access to the Web has led to increased potential for students cheating on assignments by plagiarising others' work. By the same token, Web-based tools offer the potential for instructors to check submitted assignments for signs of plagiarism. Overlap-detection tools are easy to use and accurate in plagiarism detection, so they can be an excellent deterrent to plagiarism. Documents can overlap for other reasons, too: Old documents are superseded, and authors summarize previous work identically in several papers. Overlap-detection tools can pinpoint interconnections in a corpus of documents and could be used in search engines.We describe a web-accessible text registry based on signature extraction. We extract a small but diagnostic signature from each registered text for permanent storage and comparison against other stored signatures. This comparison allows us to estimate the amount of overlap between pairs of documents, although the total time required is linear in the total size of the documents. We compare our algorithm with several alternatives and present both efficiency and accuracy results.