Robust plagiary detection using semantic compression augmented SHAPD

Authors:
Dariusz Ceglarek;Konstanty Haniewicz;Wojciech Rutkowski
Affiliations:
Poznan School of Banking, Poland;Poznan University of Economics, Poland;Ciber, Poland
Venue:
ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
Year:
2012

Citing 16
Cited 0

WordNet: a lexical database for English

Communications of the ACM
Copy detection mechanisms for digital documents

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
A fast algorithm for computing longest common subsequences

Communications of the ACM
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Selected combinatorial research problems.

Selected combinatorial research problems.
Efficient plagiarism detection for large code repositories

Software—Practice & Experience
Finding similar files in a large file system

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Computer-based plagiarism detection methods and tools: an overview

CompSysTech '07 Proceedings of the 2007 international conference on Computer systems and technologies
Introduction to Information Retrieval

Introduction to Information Retrieval
A structured vector space model for word meaning in context

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic plagiarism detection among term papers

Proceedings of the 3rd International Universal Communication Symposium
Semantically Enhanced Intellectual Property Protection System - SEIPro2S

ICCCI '09 Proceedings of the 1st International Conference on Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems
Fast plagiarism detection system

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Fast plagiarism detection by sentence hashing

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work presents results of the ongoing novel research in the area of semantic networks, plagiarism detection and general natural language processing. Results presented here demonstrate that the semantic compression is a valuable addition to the existing methods used in plagiary detection. The application of the semantic compression boosts the efficiency of Sentence Hashing Algorithm for Plagiarism Detection (SHAPD) and authors' implementation of the w-shingling algorithm. There were also test with use of the traditional Vector Space Model method that demonstrated that this technique is not well suited for plagiary detection contrary to general beliefs. All the experiments were performed on a generally available corpus built so that such analysis can be comparable to efforts of other research teams.