Nowhere to Hide: Finding Plagiarized Documents Based on Sentence Similarity

Authors:
Nathaniel Gustafson;Maria Soledad Pera;Yiu-Kai Ng
Affiliations:
-;-;-
Venue:
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Year:
2008

Citing 10
Cited 1

Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Improved annotation of the blogosphere via autotagging and hierarchical clustering

Proceedings of the 15th international conference on World Wide Web
HT06, tagging paper, taxonomy, Flickr, academic article, to read

Proceedings of the seventeenth conference on Hypertext and hypermedia
P-TAG: large scale automatic generation of personalized annotation tags for the web

Proceedings of the 16th international conference on World Wide Web
Combating spam in tagging systems

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Authors vs. readers: a comparative study of document metadata and content in the www

Proceedings of the 2007 ACM symposium on Document engineering
Can social bookmarking improve web search?

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Finding similar pages in a social tagging repository

Proceedings of the 17th international conference on World Wide Web
Social tag prediction

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Ontologies are us: a unified model of social networks and semantics

ISWC'05 Proceedings of the 4th international conference on The Semantic Web

Plagiarism detection based on structural information

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by authors (owners) of the original copies. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates agraphical view of sentences that are similar (or the same) in D1 and D2. Experimental results verify that SimPaD is highly accurate in detecting (non-) plagiarized documents and outperforms existing plagiarism-detection approaches.