Robustness against separation and outliers in logistic regression

  • Authors:
  • Peter J. Rousseeuw;Andreas Christmann

  • Affiliations:
  • Department of Mathematics and Computer Science, Universitaire Instelling Antwerpen (UIA), Universiteitsplein 1, B-2610 Wilrijk, Belgium;University of Dortmund, HRZ, Abteilung A1 University of Dortmund, D-44221 Dortmund, Germany

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2003

Quantified Score

Hi-index 0.03

Visualization

Abstract

The logistic regression model is commonly used to describe the effect of one or several explanatory variables on a binary response variable. It suffers from the problem that its parameters are not identifiable when there is separation in the space of the explanatory variables. In that case, existing fitting techniques fail to converge or give the wrong answer. To remedy this, a slightly more general model is proposed under which the observed response is strongly related but not equal to the unobservable true response. This model will be called the hidden logistic regression model because the unobservable true responses are comparable to a hidden layer in a feedforward neural net. The maximum estimated likelihood estimator is proposed in this model. It is robust against separation, always exists, and is easy to compute. Outlier-robust estimation is also studied in this setting, yielding the weighted maximum estimated likelihood estimator.