CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
Generating links by mining quotations
Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
How opinions are received by online communities: a case study on amazon.com helpfulness votes
Proceedings of the 18th international conference on World wide web
Efficient privacy-preserving similar document detection
The VLDB Journal — The International Journal on Very Large Data Bases
Detection of simple plagiarism in computer science papers
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
ATLAS: a probabilistic algorithm for high dimensional similarity search
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Experiments with filtered detection of similar academic papers
AIMSA'12 Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applications
Hi-index | 0.00 |
We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology effi- ciently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to imple- ment as a real-time submission screen for a collection many times larger.