Capturing expression using linguistic information

Authors:
Özlem Uzuner;Boris Katz
Affiliations:
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA
Venue:
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Year:
2005

Citing 8
Cited 4

Using English for indexing and retrieving

Artificial intelligence at MIT expanding frontiers
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Using Literal and Grammatical Statistics for Authorship Attribution

Problems of Information Transmission
Content and expression-based copy recognition for intellectual property protection

Proceedings of the 3rd ACM workshop on Digital rights management
Using Empirical Methods for Evaluating Expression and Content Similarity

HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 4 - Volume 4
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Exploiting lexical regularities in designing natural language systems

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Identifying expression fingerprints using linguistic information

Identifying expression fingerprints using linguistic information

Interactive Storytelling with Literary Feelings

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Using syntactic information to identify plagiarism

EdAppsNLP 05 Proceedings of the second workshop on Building Educational Applications Using NLP
Generating phrasal and sentential paraphrases: A survey of data-driven methods

Computational Linguistics
A comparative study of language models for book and author recognition

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Recognizing similarities between literary works for copyright infringement detection requires evaluating similarity in the expression of content. Copyright law protects expression of content; similarities in content alone are not enough to indicate infringement. Expression refers to the way people convey particular information; it captures both the information and the manner of its presentation. In this paper, we present a novel set of linguistically informed features that provide a computational definition of expression and that enable accurate recognition of individual titles and their paraphrases more than 80% of the time. In comparison, baseline features, e.g., tfidf-weighted keywords, function words, etc., give an accuracy of at most 53%. Our computational definition of expression uses linguistic features that are extracted from POS-tagged text using context-free grammars, without incurring the computational cost of full parsers. The results indicate that informative linguistic features do not have to be computationally prohibitively expensive to extract.