Influential data cases when the Cp criterion is used for variable selection in multiple linear regression

  • Authors:
  • S. J. Steel;D. W. Uys

  • Affiliations:
  • Department of Statistics and Actuarial Science, Stellenbosch University, Private Bag X1, 7602 Matieland, South Africa;Department of Statistics and Actuarial Science, Stellenbosch University, Private Bag X1, 7602 Matieland, South Africa

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2006

Quantified Score

Hi-index 0.03

Visualization

Abstract

The influence of data cases when the C"p criterion is used for variable selection in multiple linear regression analysis is studied in terms of the predictive power and the predictor variables included in the resulting model when variable selection is applied. In particular, the focus is on the importance of identifying and dealing with these so-called selection influential data cases before model selection and fitting are performed. A new selection influence measure based on the C"p criterion to identify selection influential data cases is developed. The success with which this influence measure identifies selection influential data cases is evaluated in two example data sets.