A diagnostic method for simultaneous feature selection and outlier identification in linear regression

  • Authors:
  • Rajiv S. Menjoge;Roy E. Welsch

  • Affiliations:
  • Operations Research Center, M.I.T., Cambridge, 02139, MA, United States;Sloan School of Management, M.I.T., Cambridge, MA, United States

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2010

Quantified Score

Hi-index 0.03

Visualization

Abstract

A diagnostic method along the lines of forward search is proposed to simultaneously study the effect of individual observations and features on the inferences made in linear regression. The method operates by appending dummy variables to the data matrix and performing backward selection on the augmented matrix. It outputs sequences of feature-outlier combinations which can be evaluated by plots similar to those of forward search and includes the capacity to incorporate prior knowledge, in order to mitigate issues such as collinearity. It also allows for alternative ways to understand the selection of the final model. The method is evaluated on five data sets and yields promising results.