Simultaneous selection of variables and smoothing parameters in structured additive regression models

  • Authors:
  • Christiane Belitz;Stefan Lang

  • Affiliations:
  • Department of Statistics, University of Munich, Ludwigstr. 33, D-80539 Munich, Germany;Department of Statistics, University of Innsbruck, Universitätsstr. 15, A-6020 Innsbruck, Austria

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2008

Quantified Score

Hi-index 0.03

Visualization

Abstract

In recent years, considerable research has been devoted to developing complex regression models that can deal simultaneously with nonlinear covariate effects and time trends, unit- or cluster specific heterogeneity, spatial heterogeneity and complex interactions between covariates of different types. Much less effort, however, has been devoted to model and variable selection. The paper develops a methodology for the simultaneous selection of variables and the degree of smoothness in regression models with a structured additive predictor. These models are quite general, containing additive (mixed) models, geoadditive models and varying coefficient models as special cases. This approach allows one to decide whether a particular covariate enters the model linearly or nonlinearly or is removed from the model. Moreover, it is possible to decide whether a spatial or cluster specific effect should be incorporated into the model to cope with spatial or cluster specific heterogeneity. Particular emphasis is also placed on selecting complex interactions between covariates and effects of different types. A new penalty for two-dimensional smoothing is proposed, that allows for ANOVA-type decompositions into main effects and an interaction effect without explicitly specifying the main effects. The penalty is an additive combination of other penalties. Fast algorithms and software are developed that allow one to even handle situations with many covariate effects and observations. The algorithms are related to backfitting and Markov chain Monte Carlo techniques, which divide the problem in a divide and conquer strategy into smaller pieces. Confidence intervals taking model uncertainty into account are based on the bootstrap in combination with MCMC techniques.