Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment

Authors:
Robert Gilmore Pontius, Jr;Marco Millones
Affiliations:
School of Geography, Clark University, Worcester, MA, USA;School of Geography, Clark University, Worcester, MA, USA
Venue:
International Journal of Remote Sensing
Year:
2011

Citing 0
Cited 3

Inferring urban land use using the optimised spatial reclassification kernel

Environmental Modelling & Software
Position paper: Characterising performance of environmental models

Environmental Modelling & Software
Modeling land use decisions with Bayesian networks: Spatially explicit analysis of driving forces on land use change

Environmental Modelling & Software

Quantified Score

Hi-index	0.01

Visualization

Abstract

The family of Kappa indices of agreement claim to compare a map's observed classification accuracy relative to the expected accuracy of baseline maps that can have two types of randomness: (1) random distribution of the quantity of each category and (2) random spatial allocation of the categories. Use of the Kappa indices has become part of the culture in remote sensing and other fields. This article examines five different Kappa indices, some of which were derived by the first author in 2000. We expose the indices' properties mathematically and illustrate their limitations graphically, with emphasis on Kappa's use of randomness as a baseline, and the often-ignored conversion from an observed sample matrix to the estimated population matrix. This article concludes that these Kappa indices are useless, misleading and/or flawed for the practical applications in remote sensing that we have seen. After more than a decade of working with these indices, we recommend that the profession abandon the use of Kappa indices for purposes of accuracy assessment and map comparison, and instead summarize the cross-tabulation matrix with two much simpler summary parameters: quantity disagreement and allocation disagreement. This article shows how to compute these two parameters using examples taken from peer-reviewed literature.