Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Using Empirical Methods for Evaluating Expression and Content Similarity
HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 4 - Volume 4
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Exploiting lexical regularities in designing natural language systems
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Identifying expression fingerprints using linguistic information
Identifying expression fingerprints using linguistic information
Capturing expression using linguistic information
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Natural Language Engineering
SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents
Web Intelligence and Agent Systems
Plagiarism detection based on structural information
Proceedings of the 20th ACM international conference on Information and knowledge management
A comparative study of language models for book and author recognition
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Retrieving candidate plagiarised documents using query expansion
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Exploiting discourse information to identify paraphrases
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Using keyword overlaps to identify plagiarism can result in many false negatives and positives: substitution of synonyms for each other reduces the similarity between works, making it difficult to recognize plagiarism; overlap in ambiguous keywords can falsely inflate the similarity of works that are in fact different in content. Plagiarism detection based on verbatim similarity of works can be rendered ineffective when works are paraphrased even in superficial and immaterial ways. Considering linguistic information related to creative aspects of writing can improve identification of plagiarism by adding a crucial dimension to evaluation of similarity: documents that share linguistic elements in addition to content are more likely to be copied from each other. In this paper, we present a set of low-level syntactic structures that capture creative aspects of writing and show that information about linguistic similarities of works improves recognition of plagiarism (over tfidf-weighted keywords alone) when combined with similarity measurements based on tfidf-weighted keywords.