Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Building a scalable and accurate copy detection mechanism
Proceedings of the first ACM international conference on Digital libraries
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
CHECK: a document plagiarism detection system
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
A document classification method by using field association words
Information Sciences—Informatics and Computer Science: An International Journal
MatchDetectReveal: finding overlapping and similar digital documents
Proceedings of the 2000 information resources management association international conference on Challenges of information technology management in the 21st century
The Earth Mover's Distance as a Metric for Image Retrieval
International Journal of Computer Vision
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Signature extraction for overlap detection in documents
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
The use of bigrams to enhance text categorization
Information Processing and Management: an International Journal
Text Retrieval Using Self-Organized Document Maps
Neural Processing Letters
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Methods for identifying versioned and plagiarized documents
Journal of the American Society for Information Science and Technology
Text Document Categorization by Term Association
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
One-class svms for document classification
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Web page feature selection and classification using neural networks
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
Marginal median SOM for document organization and retrieval
Neural Networks
A novel document retrieval method using the discrete wavelet transform
ACM Transactions on Information Systems (TOIS)
The rate adapting poisson model for information retrieval and object recognition
ICML '06 Proceedings of the 23rd international conference on Machine learning
A scaleable document clustering approach for large document corpora
Information Processing and Management: an International Journal
A flocking based algorithm for document clustering analysis
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Nature-inspired applications and systems
Signatures versus histograms: Definitions, distances and algorithms
Pattern Recognition
Clustering of Count Data Using Generalized Dirichlet Multinomial Distributions
IEEE Transactions on Knowledge and Data Engineering
Image categorization: Graph edit distance+edge direction histogram
Pattern Recognition
On Automatic Plagiarism Detection Based on n-Grams Comparison
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
A new dual wing harmonium model for document retrieval
Pattern Recognition
Expert Systems with Applications: An International Journal
A new customized document categorization scheme using rough membership
Applied Soft Computing
Multilayer SOM with tree-structured data for efficient document retrieval and plagiarism detection
IEEE Transactions on Neural Networks
Hierarchical document categorization with k-NN and concept-based thesauri
Information Processing and Management: an International Journal
Document retrieval using fuzzy-valued concept networks
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Computer algorithms for plagiarism detection
IEEE Transactions on Education
Hi-index | 0.01 |
This paper presents a systematic framework using multilevel matching approach for plagiarism detection (PD). A multilevel structure, i.e. document-paragraph-sentence, is used to represent each document. In document and paragraph level, we use traditional dimensionality reduction technique to project high dimensional histograms into latent semantic space. The Earth Mover's Distance (EMD), instead of exhaustive matching, is employed to retrieve relevant documents, which enables us to markedly shrink the searching domain. Two PD algorithms are designed and implemented to efficiently flag the suspected plagiarized document sources. We conduct extensive experimental verifications including document retrieval, PD, the study of the effects of parameters, and the empirical study of the system response. The results corroborate that the proposed approach is accurate and computationally efficient for performing PD.