Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
New indices for text: PAT Trees and PAT arrays
Information retrieval
The state of retrieval system evaluation
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Communications of the ACM
Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Encryption and Secure Computer Networks
ACM Computing Surveys (CSUR)
Adaptive sentence boundary disambiguation
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Wave-indices: indexing evolving databases
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Copy detection for intellectual property protection of VLSI designs
ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Agglomerative clustering of a search engine query log
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficiency of data structures for detecting overlaps in digital documents
ACSC '01 Proceedings of the 24th Australasian conference on Computer science
Signature extraction for overlap detection in documents
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Watermarking of Electronic Text Documents
Electronic Commerce Research
Finding Interesting Associations without Support Pruning
IEEE Transactions on Knowledge and Data Engineering
Comparison of Overlap Detection Techniques
ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Parallel and Distributed Document Overlap Detection on the Web
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
Filtering with Approximate Predicates
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Identifying and Filtering Near-Duplicate Documents
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Fingerprinting Text in Logical Markup Languages
ISC '01 Proceedings of the 4th International Conference on Information Security
Methods for identifying versioned and plagiarized documents
Journal of the American Society for Information Science and Technology
Content and expression-based copy recognition for intellectual property protection
Proceedings of the 3rd ACM workshop on Digital rights management
LSH forest: self-tuning indexes for similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
Comparison of texts streams in the presence of mild adversaries
ACSW Frontiers '05 Proceedings of the 2005 Australasian workshop on Grid computing and e-research - Volume 44
Finding near-duplicate web pages: a large-scale evaluation of algorithms
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A Dual-Method Model for Copy Detection
WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Deducing similarities in Java sources from bytecodes
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Multiple-signal duplicate detection for search evaluation
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combinatorial algorithms for web search engines: three success stories
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Plagiarism Detection Based on Singular Value Decomposition
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Large scale image copy detection evaluation
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
The design of a similarity based deduplication system
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Applying syntactic similarity algorithms for enterprise information management
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic retrieval of similar content using search engine query interface
Proceedings of the 18th ACM conference on Information and knowledge management
Multilayer SOM with tree-structured data for efficient document retrieval and plagiarism detection
IEEE Transactions on Neural Networks
Connection network and optimization of interest metric for one-to-one marketing
GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Differences and identities in document retrieval in an annotation environment
DNIS'07 Proceedings of the 5th international conference on Databases in networked information systems
Efficient privacy-preserving similar document detection
The VLDB Journal — The International Journal on Very Large Data Bases
A coarse-to-fine framework to efficiently thwart plagiarism
Pattern Recognition
Detection of simple plagiarism in computer science papers
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Facilitating interaction and retrieval for annotated documents
International Journal of Computational Science and Engineering
Developing a corpus of plagiarised short answers
Language Resources and Evaluation
Enhancing duplicate collection detection through replica boundary discovery
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
IH'04 Proceedings of the 6th international conference on Information Hiding
PPChecker: plagiarism pattern checker in document copy detection
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
A fusion of algorithms in near duplicate document detection
PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Multi-resolution similarity hashing
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Optimizing parallel algorithms for all pairs similarity search
Proceedings of the sixth ACM international conference on Web search and data mining
Cache-conscious performance optimization for similarity search
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Often, publishers are reluctant to offer valuable digital documentson the Internet for fear that they will be re-transmitted or copiedwidely. A Copy Detection Mechanism can help identify such copying.For example, publishers may register their documents with a copydetection server, and the server can then automatically checkpublic sources such as UseNet articles and Web sites for potentialillegal copies. The server can search for exact copies, and alsofor cases where significant portions of documents have been copied.In this paper we study, for the first time, the performance ofvarious copy detection mechanisms, including the disk storagerequirements, main memory requirements, response times forregistration, and response time for querying. We also contrastperformance to the accuracy of the mechanisms (how well they detectpartial copies). The results are obtained using SCAM, anexperimental server we have implemented, and a collection of 50,000netnews articles.