Gradient boosting trees for auto insurance loss cost modeling and prediction

Authors:
Leo Guelman
Affiliations:
Royal Bank of Canada, RBC Insurance, 6880 Financial Drive, Mississauga, Ontario, Canada L5N 7Y5
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 4
Cited 2

The Strength of Weak Learnability

Machine Learning
Stochastic gradient boosting

Computational Statistics & Data Analysis - Nonlinear methods and data mining
Cost-sensitive boosting for classification of imbalanced data

Pattern Recognition
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research

Bootstrap control charts in monitoring value at risk in insurance

Expert Systems with Applications: An International Journal
A causal inference approach to measure price elasticity in Automobile Insurance

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Gradient Boosting (GB) is an iterative algorithm that combines simple parameterized functions with ''poor'' performance (high prediction error) to produce a highly accurate prediction rule. In contrast to other statistical learning methods usually providing comparable accuracy (e.g., neural networks and support vector machines), GB gives interpretable results, while requiring little data preprocessing and tuning of the parameters. The method is highly robust to less than clean data and can be applied to classification or regression problems from a variety of response distributions (Gaussian, Bernoulli, Poisson, and Laplace). Complex interactions are modeled simply, missing values in the predictors are managed almost without loss of information, and feature selection is performed as an integral part of the procedure. These properties make GB a good candidate for insurance loss cost modeling. However, to the best of our knowledge, the application of this method to insurance pricing has not been fully documented to date. This paper presents the theory of GB and its application to the problem of predicting auto ''at-fault'' accident loss cost using data from a major Canadian insurer. The predictive accuracy of the model is compared against the conventional Generalized Linear Model (GLM) approach.