Using syntactic information to identify plagiarism

Authors:
Özlem Uzuner;Boris Katz;Thade Nahnsen
Affiliations:
Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
Venue:
EdAppsNLP 05 Proceedings of the second workshop on Building Educational Applications Using NLP
Year:
2005

Citing 6
Cited 6

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Using Empirical Methods for Evaluating Expression and Content Similarity

HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 4 - Volume 4
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Exploiting lexical regularities in designing natural language systems

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Identifying expression fingerprints using linguistic information

Identifying expression fingerprints using linguistic information
Capturing expression using linguistic information

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3

Introduction

Natural Language Engineering
SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents

Web Intelligence and Agent Systems
Plagiarism detection based on structural information

Proceedings of the 20th ACM international conference on Information and knowledge management
A comparative study of language models for book and author recognition

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Retrieving candidate plagiarised documents using query expansion

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Exploiting discourse information to identify paraphrases

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Using keyword overlaps to identify plagiarism can result in many false negatives and positives: substitution of synonyms for each other reduces the similarity between works, making it difficult to recognize plagiarism; overlap in ambiguous keywords can falsely inflate the similarity of works that are in fact different in content. Plagiarism detection based on verbatim similarity of works can be rendered ineffective when works are paraphrased even in superficial and immaterial ways. Considering linguistic information related to creative aspects of writing can improve identification of plagiarism by adding a crucial dimension to evaluation of similarity: documents that share linguistic elements in addition to content are more likely to be copied from each other. In this paper, we present a set of low-level syntactic structures that capture creative aspects of writing and show that information about linguistic similarities of works improves recognition of plagiarism (over tfidf-weighted keywords alone) when combined with similarity measurements based on tfidf-weighted keywords.