Automated opinion detection: Implications of the level of agreement between human raters

Authors:
Deanna Osman;John Yearwood;Peter Vamplew
Affiliations:
Data Mining and Informatics Research Group (DMIRG), Centre for Informatics and Applied Optimization (CIAO), Graduate School of Information Technology and Mathematical Sciences, University of Balla ...;Data Mining and Informatics Research Group (DMIRG), Centre for Informatics and Applied Optimization (CIAO), Graduate School of Information Technology and Mathematical Sciences, University of Balla ...;Data Mining and Informatics Research Group (DMIRG), Centre for Informatics and Applied Optimization (CIAO), Graduate School of Information Technology and Mathematical Sciences, University of Balla ...
Venue:
Information Processing and Management: an International Journal
Year:
2010

Citing 8
Cited 1

Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Human evaluation of Kea, an automatic keyphrasing system

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
The kappa statistic: a second look

Computational Linguistics
Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
A reference collection for web spam

ACM SIGIR Forum
Identifying and analyzing judgment opinions

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Towards automatic scoring of non-native spontaneous speech

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Relevance assessment: are judges exchangeable and does it matter

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Learning sentiments from tweets with personal health information

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to agree with the TREC Blog06 opinion assessments was measured for seven human assessors and compared with the submitted results of the Blog06 participants. The assessors achieved a fair level of agreement between their assessments, although the range between the assessors was large. It is recommended that multiple assessors are used to assess opinion data, or a pre-test of assessors is completed to remove the most dissenting assessors from a pool of assessors prior to the assessment process. The possibility of inconsistent assessments in a corpus also raises concerns about training data for an automated opinion detection system (AODS), so a further recommendation is that AODS training data be assembled from a variety of sources. This paper establishes an aspirational value for an AODS by determining the level of agreement achievable by human assessors when assessing the existence of an opinion on a given topic. Knowing the level of agreement amongst humans is important because it sets an upper bound on the expected performance of AODS. While the AODSs surveyed achieved satisfactory results, none achieved a result close to the upper bound.