The Strength of Weak Learnability
Machine Learning
Computational Statistics & Data Analysis - Nonlinear methods and data mining
Cost-sensitive boosting for classification of imbalanced data
Pattern Recognition
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Bootstrap control charts in monitoring value at risk in insurance
Expert Systems with Applications: An International Journal
A causal inference approach to measure price elasticity in Automobile Insurance
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
Gradient Boosting (GB) is an iterative algorithm that combines simple parameterized functions with ''poor'' performance (high prediction error) to produce a highly accurate prediction rule. In contrast to other statistical learning methods usually providing comparable accuracy (e.g., neural networks and support vector machines), GB gives interpretable results, while requiring little data preprocessing and tuning of the parameters. The method is highly robust to less than clean data and can be applied to classification or regression problems from a variety of response distributions (Gaussian, Bernoulli, Poisson, and Laplace). Complex interactions are modeled simply, missing values in the predictors are managed almost without loss of information, and feature selection is performed as an integral part of the procedure. These properties make GB a good candidate for insurance loss cost modeling. However, to the best of our knowledge, the application of this method to insurance pricing has not been fully documented to date. This paper presents the theory of GB and its application to the problem of predicting auto ''at-fault'' accident loss cost using data from a major Canadian insurer. The predictive accuracy of the model is compared against the conventional Generalized Linear Model (GLM) approach.