Inequalities between multi-rater kappas

Authors:
Matthijs J. Warrens
Affiliations:
Unit Methodology and Statistics, Institute of Psychology, Leiden University, Leiden, The Netherlands 2300 RB
Venue:
Advances in Data Analysis and Classification
Year:
2010

Citing 6
Cited 6

The kappa statistic: a second look

Computational Linguistics
Fuzzy kappa for the agreement measure of fuzzy classifications

Neurocomputing
On the Indeterminacy of Resemblance Measures for Binary (Presence/Absence) Data

Journal of Classification
On the Equivalence of Cohen's Kappa and the Hubert-Arabie Adjusted Rand Index

Journal of Classification
Bounds of Resemblance Measures for Binary (Presence/Absence) Variables

Journal of Classification
A Formal Proof of a Paradox Associated with Cohen’s Kappa

Journal of Classification

Emowisconsin: an emotional children speech database in mexican spanish

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Cohen's linearly weighted kappa is a weighted average

Advances in Data Analysis and Classification
The problem with kappa

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Signals from the crowd: uncovering social relationships through smartphone probes

Proceedings of the 2013 conference on Internet measurement conference
Integrating syntactic and semantic analysis into the open information extraction paradigm

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Learning to Recommend Descriptive Tags for Questions in Social Forums

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.02

Visualization

Abstract

The paper presents inequalities between four descriptive statistics that have been used to measure the nominal agreement between two or more raters. Each of the four statistics is a function of the pairwise information. Light's kappa and Hubert's kappa are multi-rater versions of Cohen's kappa. Fleiss' kappa is a multi-rater extension of Scott's pi, whereas Randolph's kappa generalizes Bennett et al. S to multiple raters. While a consistent ordering between the numerical values of these agreement measures has frequently been observed in practice, there is thus far no theoretical proof of a general ordering inequality among these measures. It is proved that Fleiss' kappa is a lower bound of Hubert's kappa and Randolph's kappa, and that Randolph's kappa is an upper bound of Hubert's kappa and Light's kappa if all pairwise agreement tables are weakly marginal symmetric or if all raters assign a certain minimum proportion of the objects to a specified category.