Accurate intelligible models with pairwise interactions

Authors:
Yin Lou;Rich Caruana;Johannes Gehrke;Giles Hooker
Affiliations:
Cornell University, Ithaca, New York, USA;Microsoft Research, Redmond, Washington, USA;Cornell University, Ithaca, New York, USA;Cornell University, Ithaca, New York, USA
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 9
Cited 0

An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Discovering additive structure in black box functions

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Generalized Additive Models (Texts in Statistical Science)

Generalized Additive Models (Texts in Statistical Science)
Detecting statistical interactions with additive groves of trees

Proceedings of the 25th international conference on Machine learning
Introduction to Information Retrieval

Introduction to Information Retrieval
Additive Groves of Regression Trees

ECML '07 Proceedings of the 18th European conference on Machine Learning
Rule-based machine learning methods for functional prediction

Journal of Artificial Intelligence Research
Intelligible models for classification and regression

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Standard generalized additive models (GAMs) usually model the dependent variable as a sum of univariate models. Although previous studies have shown that standard GAMs can be interpreted by users, their accuracy is significantly less than more complex models that permit interactions. In this paper, we suggest adding selected terms of interacting pairs of features to standard GAMs. The resulting models, which we call GA2{M}$-models, for Generalized Additive Models plus Interactions, consist of univariate terms and a small number of pairwise interaction terms. Since these models only include one- and two-dimensional components, the components of GA2M-models can be visualized and interpreted by users. To explore the huge (quadratic) number of pairs of features, we develop a novel, computationally efficient method called FAST for ranking all possible pairs of features as candidates for inclusion into the model. In a large-scale empirical study, we show the effectiveness of FAST in ranking candidate pairs of features. In addition, we show the surprising result that GA2M-models have almost the same performance as the best full-complexity models on a number of real datasets. Thus this paper postulates that for many problems, GA2M-models can yield models that are both intelligible and accurate.