Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
Building a scalable and accurate copy detection mechanism
Proceedings of the first ACM international conference on Digital libraries
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
CHECK: a document plagiarism detection system
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
MatchDetectReveal: finding overlapping and similar digital documents
Proceedings of the 2000 information resources management association international conference on Challenges of information technology management in the 21st century
High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid?
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Hi-index | 0.00 |
Proliferation of digital libraries plus availability of electronic documents from the Internet have created new challenges for computer science researchers and professionals. Documents are easily copied and redistributed or used to create plagiarised assignments and conference papers. This paper presents a new, two-stage approach for identifying overlapping documents. The first stage is identifying a set of candidate documents that are compared in the second stage using a matching-engine. The algorithm of the matching-engine is based on suffix trees and it modifies the known matching statistics algorithm. Parallel and distributed approaches are discussed at both stages and performance results are presented.