A similarity-based probability model for latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
visualising semantic spaces and author co-citation networks in digital libraries
Information Processing and Management: an International Journal - Special issue on progress toward digital libraries
Summarizing scientific articles: experiments with relevance and rhetorical status
Computational Linguistics - Summarization
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
AIDAS: Incremental Logical Structure Discovery in PDF Documents
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Logical Structure Analysis and Generation for Structured Documents: A Syntactic Approach
IEEE Transactions on Knowledge and Data Engineering
Sentence-based natural language plagiarism detection
Journal on Educational Resources in Computing (JERIC)
What's yours and what's mine: determining intellectual attribution in scientific text
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Logical structure based semantic relationship extraction from semi-structured documents
Proceedings of the 15th international conference on World Wide Web
Using citations for ranking in digital libraries
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Plagiarism Detection Based on Singular Value Decomposition
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm
ICICIC '08 Proceedings of the 2008 3rd International Conference on Innovative Computing Information and Control
Practical issues for academics using the Turnitin plagiarism detection software
CompSysTech '08 Proceedings of the 9th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
The toolbox for local and global plagiarism detection
Computers & Education
On Automatic Plagiarism Detection Based on n-Grams Comparison
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Text type structure and logical document structure
DiscAnnotation '04 Proceedings of the 2004 ACL Workshop on Discourse Annotation
ICCIT '09 Proceedings of the 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology
Automatic document structure detection for data integration
BIS'07 Proceedings of the 10th international conference on Business information systems
Enhancing document structure analysis using visual analytics
Proceedings of the 2010 ACM Symposium on Applied Computing
Efficient privacy-preserving similar document detection
The VLDB Journal — The International Journal on Very Large Data Bases
WINGNUS: Keyphrase extraction utilizing document logical structure
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Using structural information to improve search in Web collections
Journal of the American Society for Information Science and Technology
Mining citation information from CiteSeer data
Scientometrics
An evaluation framework for plagiarism detection
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Developing a corpus of plagiarised short answers
Language Resources and Evaluation
BibPro: A Citation Parser Based on Sequence Alignment
IEEE Transactions on Knowledge and Data Engineering
A new model of document structure analysis
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
A sentence-based copy detection approach for web documents
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
Word length n-grams for text re-use detection
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Hi-index | 0.00 |
In plagiarism detection (PD) systems, two important problems should be considered: the problem of retrieving candidate documents that are globally similar to a document q under investigation, and the problem of side-by-side comparison of q and its candidates to pinpoint plagiarized fragments in detail. In this article, the authors investigate the usage of structural information of scientific publications in both problems, and the consideration of citation evidence in the second problem. Three statistical measures namely Inverse Generic Class Frequency, Spread, and Depth are introduced to assign a degree of importance (i.e., weight) to structural components in scientific articles. A term-weighting scheme is adjusted to incorporate component-weight factors, which is used to improve the retrieval of potential sources of plagiarism. A plagiarism screening process is applied based on a measure of resemblance, in which component-weight factors are exploited to ignore less or nonsignificant plagiarism cases. Using the notion of citation evidence, parts with proper citation evidence are excluded, and remaining cases are suspected and used to calculate the similarity index. The authors compare their approach to two flat-based baselines, TF-IDF weighting with a Cosine coefficient, and shingling with a Jaccard coefficient. In both baselines, they use different comparison units with overlapping measures for plagiarism screening. They conducted extensive experiments using a dataset of 15,412 documents divided into 8,657 source publications and 6,755 suspicious queries, which included 18,147 plagiarism cases inserted automatically. Component-weight factors are assessed using precision, recall, and F-measure averaged over a 10-fold cross-validation and compared using the ANOVA statistical test. Results from structural-based candidate retrieval and plagiarism detection are evaluated statistically against the flat baselines using paired-t tests on 10-fold cross-validation runs, which demonstrate the efficacy achieved by the proposed framework. An empirical study on the system's response shows that structural information, unlike existing plagiarism detectors, helps to flag significant plagiarism cases, improve the similarity index, and provide human-like plagiarism screening results. © 2012 Wiley Periodicals, Inc.