Selection and validation of parameters in multiple linear and principal component regressions

  • Authors:
  • J. C. M. Pires;F. G. Martins;S. I. V. Sousa;M. C. M. Alvim-Ferraz;M. C. Pereira

  • Affiliations:
  • LEPí, Departamento de Engenharia Química, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal;LEPí, Departamento de Engenharia Química, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal;LEPí, Departamento de Engenharia Química, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal;LEPí, Departamento de Engenharia Química, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal;LEPí, Departamento de Engenharia Química, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal

  • Venue:
  • Environmental Modelling & Software
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper aims to select statistically valid regression parameters using multiple linear and principal component regression models. The selection methods were: (i) backward elimination based on the confidence interval limits; (ii) backward elimination based on the correlation coefficient; (iii) forward selection based on the correlation coefficient; (iv) forward selection based on the sum of square errors; and (v) combinations of all variables. For the purpose of the work, a case study was considered. The case study focused on the determination of the parameters that influence the concentration of tropospheric ozone. The explanatory variables were meteorological data (temperature, relative humidity, wind speed, wind direction and solar radiation), and environmental data (nitrogen oxides and ozone concentrations of the previous day). The results showed that each selection method led to different multiple linear regression models, as a consequence of the collinearities between explanatory variables. Such collinearities can be removed by pre-processing the explanatory data set, through the application of principal component analysis. The application of this procedure allowed the achievement of the same regression model using all selection methods.