Estimation of a regression spline sample selection model

Authors:
Giampiero Marra;Rosalba Radice
Affiliations:
Department of Statistical Science, University College London, London WC1E 6BT, UK;Department of Economics, Mathematics and Statistics, Birkbeck, University of London, London WC1E 7HX, UK
Venue:
Computational Statistics & Data Analysis
Year:
2013

Citing 5
Cited 0

Collinearity and Two-Step Estimation of Sample Selection Models: Problems, Origins, and Remedies

Computational Economics
Comparing principal stratification and selection models in parametric causal inference with nonignorable missingness

Computational Statistics & Data Analysis
Tobit model with covariate dependent thresholds

Computational Statistics & Data Analysis
Estimation of sample selection models with two selection mechanisms

Computational Statistics & Data Analysis
An Introduction to Copulas

An Introduction to Copulas

Quantified Score

Hi-index	0.03

Visualization

Abstract

It is often the case that an outcome of interest is observed for a restricted non-randomly selected sample of the population. In such a situation, standard statistical analysis yields biased results. This issue can be addressed using sample selection models which are based on the estimation of two regressions: a binary selection equation determining whether a particular statistical unit will be available in the outcome equation. Classic sample selection models assume a priori that continuous regressors have a pre-specified linear or non-linear relationship to the outcome, which can lead to erroneous conclusions. In the case of continuous response, methods in which covariate effects are modeled flexibly have been previously proposed, the most recent being based on a Bayesian Markov chain Monte Carlo approach. A frequentist counterpart which has the advantage of being computationally fast is introduced. The proposed algorithm is based on the penalized likelihood estimation framework. The construction of confidence intervals is also discussed. The empirical properties of the existing and proposed methods are studied through a simulation study. The approaches are finally illustrated by analyzing data from the RAND Health Insurance Experiment on annual health expenditures.