Detection of similarities in student programs: YAP'ing may be preferable to plague'ing
SIGCSE '92 Proceedings of the twenty-third SIGCSE technical symposium on Computer science education
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Building a scalable and accurate copy detection mechanism
Proceedings of the first ACM international conference on Digital libraries
YAP3: improved detection of similarities in computer program and other texts
SIGCSE '96 Proceedings of the twenty-seventh SIGCSE technical symposium on Computer science education
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Exploring the similarity space
ACM SIGIR Forum
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Document overlap detection system for distributed digital libraries
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Effective ranking with arbitrary passages
Journal of the American Society for Information Science and Technology
Collection statistics for fast duplicate document detection
ACM Transactions on Information Systems (TOIS)
Finding Near-Replicas of Documents and Servers on the Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Performance in Practice of String Hashing Functions
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Video similarity detection for digital rights management
ACSC '03 Proceedings of the 26th Australasian computer science conference - Volume 16
Fast video matching with signature alignment
MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
Improved robustness of signature-based near-replica detection via lexicon randomization
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Near-duplicate detection for eRulemaking
dg.o '05 Proceedings of the 2005 national conference on Digital government research
Sentence-based natural language plagiarism detection
Journal on Educational Resources in Computing (JERIC)
Similarity measures for tracking information flow
Proceedings of the 14th ACM international conference on Information and knowledge management
Redundant documents and search effectiveness
Proceedings of the 14th ACM international conference on Information and knowledge management
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Detection of video sequences using compact signatures
ACM Transactions on Information Systems (TOIS)
Managing déjà vu: Collection building for the identification of nonidentical duplicate documents
Journal of the American Society for Information Science and Technology - Research Articles
The methodology and an application to fight against Unicode attacks
SOUPS '06 Proceedings of the second symposium on Usable privacy and security
Next steps in near-duplicate detection for eRulemaking
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Finding near-duplicate web pages: a large-scale evaluation of algorithms
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Near-duplicate detection by instance-level constrained clustering
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Desktop tools for offline plagiarism detection in computer programs
Informatics in education
Plagiarism detection across programming languages
ACSC '06 Proceedings of the 29th Australasian Computer Science Conference - Volume 48
Accurate discovery of co-derivative documents via duplicate text detection
Information Systems
Efficient plagiarism detection for large code repositories
Software—Practice & Experience
Do not crawl in the dust: different urls with similar text
Proceedings of the 16th international conference on World Wide Web
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
An approach to evaluate policy similarity
Proceedings of the 12th ACM symposium on Access control models and technologies
Distributed text retrieval from overlapping collections
ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Strategies for retrieving plagiarized documents
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient similarity joins for near duplicate detection
Proceedings of the 17th international conference on World Wide Web
SpotSigs: robust and efficient near duplicate detection in large web collections
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Improving web information indexing and retrieval based on center block duplication detection
International Journal of Innovative Computing and Applications
Lexicon randomization for near-duplicate detection with I-Match
The Journal of Supercomputing
Identifying Quotations in Reference Works and Primary Materials
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
The Evaluation of Sentence Similarity Measures
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Achieving both high precision and high recall in near-duplicate detection
Proceedings of the 17th ACM conference on Information and knowledge management
Anti-plagiarism certification be an academic mandate
Journal of the American Society for Information Science and Technology
Do not crawl in the DUST: Different URLs with similar text
ACM Transactions on the Web (TWEB)
Detecting the origin of text segments efficiently
Proceedings of the 18th international conference on World wide web
Applying syntactic similarity algorithms for enterprise information management
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting Sentence-Level Features for Near-Duplicate Document Detection
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
ACM Transactions on Information Systems (TOIS)
Organizing news archives by near-duplicate copy detection in digital libraries
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Adaptive near-duplicate detection via similarity learning
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A coarse-to-fine framework to efficiently thwart plagiarism
Pattern Recognition
Plagiarism detection across distant language pairs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Detection of simple plagiarism in computer science papers
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Language Resources and Evaluation
Cross-language plagiarism detection
Language Resources and Evaluation
SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents
Web Intelligence and Agent Systems
Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
SizeSpotSigs: an effective deduplicate algorithm considering the size of page content
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Proceedings of the 11th ACM symposium on Document engineering
Partial duplicate detection for large book collections
Proceedings of the 20th ACM international conference on Information and knowledge management
Plagiarism detection based on structural information
Proceedings of the 20th ACM international conference on Information and knowledge management
Mining relational structure from millions of books: position paper
Proceedings of the 4th ACM workshop on Online books, complementary social media and crowdsourcing
Identifying information provenance in support of intelligence analysis, sharing, and protection
ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Compact features for detection of near-duplicates in distributed retrieval
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Temporal shingling for version identification in web archives
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Intrinsic plagiarism detection
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Scalable sequence similarity search and join in main memory on multi-cores
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
A plagiarism detection system for arabic text-based documents
PAISI'12 Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics
Detecting quilted web pages at scale
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Finding translations in scanned book collections
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Measuring semantic relatedness using multilingual representations
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Detecting text reuse with modified and weighted n-grams
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Text reuse with ACL: (upward) trends
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Learning to rank duplicate bug reports
Proceedings of the 21st ACM international conference on Information and knowledge management
Increasing recall for text re-use in historical documents to support research in the humanities
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Experiments with filtered detection of similar academic papers
AIMSA'12 Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applications
Determining and characterizing the reused text for plagiarism detection
Expert Systems with Applications: An International Journal
Research on intrinsic plagiarism detection resolution: a supervised learning approach
CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
Reducing information redundancy in search results
Proceedings of the 28th Annual ACM Symposium on Applied Computing
VILO: a rapid learning nearest-neighbor classifier for malware triage
Journal in Computer Virology
Plagiarism Detection for Indonesian Texts
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Multi-level sequence alignment: a trade-off between speed and accuracy in similar text searching
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
CoBAn: A context based model for data leakage prevention
Information Sciences: an International Journal
Hi-index | 0.00 |
The widespread use of on-line publishing of text promotes storage of multiple versions of documents and mirroring of documents in multiple locations, and greatly simplifies the task of plagiarizing the work of others. We evaluate two families of methods for searching a collection to find documents that are coderivative, that is, are versions or plagiarisms of each other. The first, the ranking family, uses information retrieval techniques; extending this family, we propose the identity measure, which is specifically designed for identification of co-derivative documents. The second, the fingerprinting family, uses hashing to generate a compact document description, which can then be compared to the fingerprints of the documents in the collection. We introduce a new method for evaluating the effectiveness of these techniques, and demonstrate it in practice. Using experiments on two collections, we demonstrate that the identity measure and the best fingerprinting technique are both able to accurately identify coderivative documents. However, for fingerprinting parameters must be carefully chosen, and even so the identity measure is clearly superior.