Semi-supervised recognition of sarcastic sentences in Twitter and Amazon

Authors:
Dmitry Davidov;Oren Tsur;Ari Rappoport
Affiliations:
The Hebrew University, Jerusalem, Israel;The Hebrew University, Jerusalem, Israel;The Hebrew University, Jerusalem, Israel
Venue:
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Year:
2010

Citing 9
Cited 11

A unified theory of irony and its computational formalization

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Mining and summarizing customer reviews

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning Subjective Language

Computational Linguistics
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Efficient unsupervised discovery of word categories using symmetric patterns and high frequency words

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Extracting product features and opinions from reviews

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Making computers laugh: investigations in automatic humor recognition

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Characterizing Humour: An Exploration of Features in Humorous Texts

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Automatic satire detection: are you having a laugh?

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

Enhanced sentiment learning using Twitter hashtags and smileys

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Identifying sarcasm in Twitter: a closer look

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
How can you say such things?!?: recognizing disagreement in informal political argument

LSM '11 Proceedings of the Workshop on Languages in Social Media
Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach

Proceedings of the 20th ACM international conference on Information and knowledge management
Mining subjective knowledge from customer reviews: a specific case of irony detection

WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
What's in a hashtag?: content based prediction of the spread of ideas in microblogging communities

Proceedings of the fifth ACM international conference on Web search and data mining
We know what @you #tag: does the dual role affect hashtag adoption?

Proceedings of the 21st international conference on World Wide Web
Making objective decisions from subjective data: Detecting irony in customer reviews

Decision Support Systems
Automatic humor classification on Twitter

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
A fuzzy conceptualization model for text mining with application in opinion polarity classification

Knowledge-Based Systems
Twitter n-gram corpus with demographic metadata

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sarcasm is a form of speech act in which the speakers convey their message in an implicit way. The inherently ambiguous nature of sarcasm sometimes makes it hard even for humans to decide whether an utterance is sarcastic or not. Recognition of sarcasm can benefit many sentiment analysis NLP applications, such as review summarization, dialogue systems and review ranking systems. In this paper we experiment with semi-supervised sarcasm identification on two very different data sets: a collection of 5.9 million tweets collected from Twitter, and a collection of 66000 product reviews from Amazon. Using the Mechanical Turk we created a gold standard sample in which each sentence was tagged by 3 annotators, obtaining F-scores of 0.78 on the product reviews dataset and 0.83 on the Twitter dataset. We discuss the differences between the datasets and how the algorithm uses them (e.g., for the Amazon dataset the algorithm makes use of structured information). We also discuss the utility of Twitter #sarcasm hashtags for the task.