Gradient boosting trees for auto insurance loss cost modeling and prediction

  • Authors:
  • Leo Guelman

  • Affiliations:
  • Royal Bank of Canada, RBC Insurance, 6880 Financial Drive, Mississauga, Ontario, Canada L5N 7Y5

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

Gradient Boosting (GB) is an iterative algorithm that combines simple parameterized functions with ''poor'' performance (high prediction error) to produce a highly accurate prediction rule. In contrast to other statistical learning methods usually providing comparable accuracy (e.g., neural networks and support vector machines), GB gives interpretable results, while requiring little data preprocessing and tuning of the parameters. The method is highly robust to less than clean data and can be applied to classification or regression problems from a variety of response distributions (Gaussian, Bernoulli, Poisson, and Laplace). Complex interactions are modeled simply, missing values in the predictors are managed almost without loss of information, and feature selection is performed as an integral part of the procedure. These properties make GB a good candidate for insurance loss cost modeling. However, to the best of our knowledge, the application of this method to insurance pricing has not been fully documented to date. This paper presents the theory of GB and its application to the problem of predicting auto ''at-fault'' accident loss cost using data from a major Canadian insurer. The predictive accuracy of the model is compared against the conventional Generalized Linear Model (GLM) approach.