Partially identified prevalence estimation under misclassification using the kappa coefficient

Authors:
Helmut KüChenhoff;Thomas Augustin;Anne Kunz
Affiliations:
Department of Statistics, Ludwig-Maximilians-Universität (LMU), Munich, Germany;Department of Statistics, Ludwig-Maximilians-Universität (LMU), Munich, Germany;Metronomia Clinical Research GmbH, Munich, Germany
Venue:
International Journal of Approximate Reasoning
Year:
2012

Citing 6
Cited 0

Updating beliefs with incomplete observations

Artificial Intelligence
Decision making under incomplete data using the imprecise Dirichlet model

International Journal of Approximate Reasoning
Confidence intervals for parameters of two diagnostic tests in the absence of a gold standard

Computational Statistics & Data Analysis
Symbolic Data Analysis and the SODAS Software

Symbolic Data Analysis and the SODAS Software
Conservative inference rule for uncertain reasoning under incompleteness

Journal of Artificial Intelligence Research
Partial identification with missing data: concepts and findings

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss prevalence estimation under misclassification. That is we are concerned with the estimation of a proportion of units having a certain property (being diseased, showing deviant behavior, etc.) from a random sample when the true variable of interest cannot be observed, but a related proxy variable (e.g. the outcome of a diagnostic test) is available. If the misclassification probabilities were known then unbiased prevalence estimation would be possible. We focus on the frequent case where the misclassification probabilities are unknown but two independent replicate measurements have been taken. While in the traditional precise probabilistic framework a correction from this information is not possible due to non-identifiability, the imprecise probability methodology of partial identification and systematic sensitivity analysis allows to obtain valuable insights into possible bias due to misclassification. We derive tight identification intervals and corresponding confidence regions for the true prevalence, based on the often reported kappa coefficient, which condenses the information of the replicates by measuring agreement between the two measurements. Our method is illustrated in several theoretical scenarios and in an example from oral health on prevalence of caries in children.