Measuring overlap in binary regression

  • Authors:
  • Andreas Christmann;Peter J Rousseeuw

  • Affiliations:
  • University of Dortmund, SFB-475, HRZ, D-44221 Dortmund, Germany;Department of Mathematics and Computer Science, Universitaire Instelling Antwerpen (UIA), Universiteitsplein 1, B-2610 Wilrijk, Belgium

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2001

Quantified Score

Hi-index 0.03

Visualization

Abstract

In this paper, we show that the recent notion of regression depth can be used as a data-analytic tool to measure the amount of separation between successes and failures in the binary response framework. Extending this algorithm, allows us to compute the overlap in data sets which are commonly fitted by logistic or probit regression models. The overlap is the number of observations that would need to be removed to obtain complete or quasi-complete separation, i.e. the situation where the regression parameters are no longer identifiable and the maximum likelihood estimate does not exist. It turns out that the overlap is often quite small. The results are equally useful in linear discriminant analysis.