Facilitating score and causal inference trees for large observational studies

  • Authors:
  • Xiaogang Su;Joseph Kang;Juanjuan Fan;Richard A. Levine;Xin Yan

  • Affiliations:
  • School of Nursing, University of Alabama at Birmingham, Birmingham, AL;Department of Preventive Medicine, Northwestern University, Chicago, IL;Department of Mathematics and Statistics, San Diego State University, San Diego, CA;Department of Mathematics and Statistics, San Diego State University, San Diego, CA;Department of Statistics, University of Central Florida, Orlando, FL

  • Venue:
  • The Journal of Machine Learning Research
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Assessing treatment effects in observational studies is a multifaceted problem that not only involves heterogeneous mechanisms of how the treatment or cause is exposed to subjects, known as propensity, but also differential causal effects across sub-populations. We introduce a concept termed the facilitating score to account for both the confounding and interacting impacts of covariates on the treatment effect. Several approaches for estimating the facilitating score are discussed. In particular, we put forward a machine learning method, called causal inference tree (CIT), to provide a piecewise constant approximation of the facilitating score. With interpretable rules, CIT splits data in such a way that both the propensity and the treatment effect become more homogeneous within each resultant partition. Causal inference at different levels can be made on the basis of CIT. Together with an aggregated grouping procedure, CIT stratifies data into strata where causal effects can be conveniently assessed within each. Besides, a feasible way of predicting individual causal effects (ICE) is made available by aggregating ensemble CIT models. Both the stratified results and the estimated ICE provide an assessment of heterogeneity of causal effects and can be integrated for estimating the average causal effect (ACE). Mean square consistency of CIT is also established. We evaluate the performance of proposed methods with simulations and illustrate their use with the NSW data in Dehejia and Wahba (1999) where the objective is to assess the impact of a labor training program, the National SupportedWork (NSW) demonstration, on post-intervention earnings.