Estimation of a regression spline sample selection model

  • Authors:
  • Giampiero Marra;Rosalba Radice

  • Affiliations:
  • Department of Statistical Science, University College London, London WC1E 6BT, UK;Department of Economics, Mathematics and Statistics, Birkbeck, University of London, London WC1E 7HX, UK

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2013

Quantified Score

Hi-index 0.03

Visualization

Abstract

It is often the case that an outcome of interest is observed for a restricted non-randomly selected sample of the population. In such a situation, standard statistical analysis yields biased results. This issue can be addressed using sample selection models which are based on the estimation of two regressions: a binary selection equation determining whether a particular statistical unit will be available in the outcome equation. Classic sample selection models assume a priori that continuous regressors have a pre-specified linear or non-linear relationship to the outcome, which can lead to erroneous conclusions. In the case of continuous response, methods in which covariate effects are modeled flexibly have been previously proposed, the most recent being based on a Bayesian Markov chain Monte Carlo approach. A frequentist counterpart which has the advantage of being computationally fast is introduced. The proposed algorithm is based on the penalized likelihood estimation framework. The construction of confidence intervals is also discussed. The empirical properties of the existing and proposed methods are studied through a simulation study. The approaches are finally illustrated by analyzing data from the RAND Health Insurance Experiment on annual health expenditures.