A Bayesian network framework for reject inference

  • Authors:
  • Andrew Smith;Charles Elkan

  • Affiliations:
  • University of California - San Diego, La Jolla, CA;University of California - San Diego, La Jolla, CA

  • Venue:
  • Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most learning methods assume that the training set is drawn randomly from the population to which the learned model is to be applied. However in many applications this assumption is invalid. For example, lending institutions create models of who is likely to repay a loan from training sets consisting of people in their records to whom loans were given in the past; however, the institution approved loan applications previously based on who was thought unlikely to default. Learning from only approved loans yields an incorrect model because the training set is a biased sample of the general population of applicants. The issue of including rejected samples in the learning process, or alternatively using rejected samples to adjust a model learned from accepted samples only, is called reject inference.The main contribution of this paper is a systematic analysis of different cases that arise in reject inference, with explanations of which cases arise in various real-world situations. We use Bayesian networks to formalize each case as a set of conditional independence relationships and identify eight cases, including the familiar missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) cases. For each case we present an overview of available learning algorithms. These algorithms have been published in separate fields of research, including epidemiology, econometrics, clinical trial evaluation, sociology, and credit scoring; our second major contribution is to describe these algorithms in a common framework.