SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents

Authors:
Maria Soledad Pera;Yiu-Kai Ng
Affiliations:
-;(Correspd.) 3361 TMCB, Computer Science Department, Brigham Young University, Provo, Utah, USA, E-mail: {ng,mpera}@cs.byu.edu
Venue:
Web Intelligence and Agent Systems
Year:
2011

Citing 26
Cited 1

Dotplot patterns: a literal look at pattern languages

Theory and Practice of Object Systems - Special issue on patterns
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Document overlap detection system for distributed digital libraries

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Induction of Decision Trees

Machine Learning
Methods for identifying versioned and plagiarized documents

Journal of the American Society for Information Science and Technology
A repetition based measure for verification of text collections and for text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Using Empirical Methods for Evaluating Expression and Content Similarity

HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 4 - Volume 4
Tool support for plagiarism detection in text documents

Proceedings of the 2005 ACM symposium on Applied computing
Loosely tree-based alignment for machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Sentence-based natural language plagiarism detection

Journal on Educational Resources in Computing (JERIC)
Similarity measures for tracking information flow

Proceedings of the 14th ACM international conference on Information and knowledge management
SNITCH: a software tool for detecting cut and paste plagiarism

Proceedings of the 37th SIGCSE technical symposium on Computer science education
Plagiarism Detection in Large Sets of Press Agency News Articles

DEXA '06 Proceedings of the 17th International Conference on Database and Expert Systems Applications
Plagiarism Detection through Multilevel Text Comparison

AXMEDIS '06 Proceedings of the Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution
EPCI: extracting potentially copyright infringement texts from the web

Proceedings of the 16th international conference on World Wide Web
A natural language processing approach to automatic plagiarism detection

Proceedings of the 8th ACM SIGITE conference on Information technology education
Computer-based plagiarism detection methods and tools: an overview

CompSysTech '07 Proceedings of the 2007 international conference on Computer systems and technologies
Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles

Integrated Computer-Aided Engineering
Plagiarism Detection Based on Singular Value Decomposition

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Multilingual Plagiarism Detection

AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm

ICICIC '08 Proceedings of the 2008 3rd International Conference on Innovative Computing Information and Control
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
On Automatic Plagiarism Detection Based on n-Grams Comparison

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Using syntactic information to identify plagiarism

EdAppsNLP 05 Proceedings of the second workshop on Building Educational Applications Using NLP
Using word clusters to detect similar web documents

KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management

Online plagiarism detection through exploiting lexical, syntactic, and semantic information

ACL '12 Proceedings of the ACL 2012 System Demonstrations

Quantified Score

Hi-index	0.00

Visualization

Abstract

Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by their legal owners. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications and easy access on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates a graphical view of sentences that are similar (or the same) in D1 and D2. Experimental results verify that SimPaD is highly accurate in detecting (non-)plagiarized documents and outperforms existing plagiarism-detection approaches.