Programming perl
Building a scalable and accurate copy detection mechanism
Proceedings of the first ACM international conference on Digital libraries
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
An optimal algorithm for approximate nearest neighbor searching
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Document overlap detection system for distributed digital libraries
DL '00 Proceedings of the fifth ACM conference on Digital libraries
An Algorithm for Finding Best Matches in Logarithmic Expected Time
ACM Transactions on Mathematical Software (TOMS)
Plagiarism detection of text using knowledge-based techniques
Design and application of hybrid intelligent systems
Finding similar files in large document repositories
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Sentence-based natural language plagiarism detection
Journal on Educational Resources in Computing (JERIC)
Managing duplicates in a web archive
Proceedings of the 2006 ACM symposium on Applied computing
Generating links by mining quotations
Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Do not crawl in the DUST: Different URLs with similar text
ACM Transactions on the Web (TWEB)
A coarse-to-fine framework to efficiently thwart plagiarism
Pattern Recognition
An evolutionary neural network approach to intrinsic plagiarism detection
AICS'09 Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science
Language Resources and Evaluation
Proceedings of the 11th ACM symposium on Document engineering
PPChecker: plagiarism pattern checker in document copy detection
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Research on intrinsic plagiarism detection resolution: a supervised learning approach
CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
Hi-index | 0.00 |
Easy access to the Web has led to increased potential for students cheating on assignments by plagiarising others' work. By the same token, Web-based tools offer the potential for instructors to check submitted assignments for signs of plagiarism. Overlap-detection tools are easy to use and accurate in plagiarism detection, so they can be an excellent deterrent to plagiarism. Documents can overlap for other reasons, too: Old documents are superseded, and authors summarize previous work identically in several papers. Overlap-detection tools can pinpoint interconnections in a corpus of documents and could be used in search engines.We describe a web-accessible text registry based on signature extraction. We extract a small but diagnostic signature from each registered text for permanent storage and comparison against other stored signatures. This comparison allows us to estimate the amount of overlap between pairs of documents, although the total time required is linear in the total size of the documents. We compare our algorithm with several alternatives and present both efficiency and accuracy results.