Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Improved annotation of the blogosphere via autotagging and hierarchical clustering
Proceedings of the 15th international conference on World Wide Web
HT06, tagging paper, taxonomy, Flickr, academic article, to read
Proceedings of the seventeenth conference on Hypertext and hypermedia
P-TAG: large scale automatic generation of personalized annotation tags for the web
Proceedings of the 16th international conference on World Wide Web
Combating spam in tagging systems
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Authors vs. readers: a comparative study of document metadata and content in the www
Proceedings of the 2007 ACM symposium on Document engineering
Can social bookmarking improve web search?
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Finding similar pages in a social tagging repository
Proceedings of the 17th international conference on World Wide Web
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Ontologies are us: a unified model of social networks and semantics
ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Plagiarism detection based on structural information
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by authors (owners) of the original copies. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates agraphical view of sentences that are similar (or the same) in D1 and D2. Experimental results verify that SimPaD is highly accurate in detecting (non-) plagiarized documents and outperforms existing plagiarism-detection approaches.