Using English for indexing and retrieving
Artificial intelligence at MIT expanding frontiers
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Using Literal and Grammatical Statistics for Authorship Attribution
Problems of Information Transmission
Content and expression-based copy recognition for intellectual property protection
Proceedings of the 3rd ACM workshop on Digital rights management
Using Empirical Methods for Evaluating Expression and Content Similarity
HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 4 - Volume 4
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Exploiting lexical regularities in designing natural language systems
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Identifying expression fingerprints using linguistic information
Identifying expression fingerprints using linguistic information
Interactive Storytelling with Literary Feelings
ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Using syntactic information to identify plagiarism
EdAppsNLP 05 Proceedings of the second workshop on Building Educational Applications Using NLP
Generating phrasal and sentential paraphrases: A survey of data-driven methods
Computational Linguistics
A comparative study of language models for book and author recognition
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Hi-index | 0.01 |
Recognizing similarities between literary works for copyright infringement detection requires evaluating similarity in the expression of content. Copyright law protects expression of content; similarities in content alone are not enough to indicate infringement. Expression refers to the way people convey particular information; it captures both the information and the manner of its presentation. In this paper, we present a novel set of linguistically informed features that provide a computational definition of expression and that enable accurate recognition of individual titles and their paraphrases more than 80% of the time. In comparison, baseline features, e.g., tfidf-weighted keywords, function words, etc., give an accuracy of at most 53%. Our computational definition of expression uses linguistic features that are extracted from POS-tagged text using context-free grammars, without incurring the computational cost of full parsers. The results indicate that informative linguistic features do not have to be computationally prohibitively expensive to extract.