Subjectivity annotation of the microblog 2011 realtime adhoc relevance judgments

Authors:
Georgios Paltoglou;Kevan Buckley
Affiliations:
School of Technology, University of Wolverhampton, Wolverhampton, UK;School of Technology, University of Wolverhampton, Wolverhampton, UK
Venue:
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Year:
2013

Citing 4
Cited 0

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Sentiment in short strength detection informal text

Journal of the American Society for Information Science and Technology
On building a reusable Twitter corpus

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we extend the Microblog dataset with subjectivity annotations. Our aim is twofold; first, we want to provide a high-quality, multiply-annotated gold standard of subjectivity annotations for the relevance assessments of the real-time adhoc task. Second, we randomly sample the rest of the dataset and annotate it for subjectivity once, in order to create a complementary annotated dataset that is at least an order of magnitude larger than the gold standard. As a result we have 2,389 tweets that have been annotated by multiple humans and 75,761 tweets that have been annotated by one annotator. We discuss issues like inter-annotator agreement, the time that it took annotators to classify tweets in correlation to their subjective content and lastly, the distribution of subjective tweets in relation to topic categorization. The annotated datasets and all relevant anonymised information are freely available for research purposes.