Zero-Inflated Boosted Ensembles for Rare Event Counts

Authors:
Alexander Borisov;George Runger;Eugene Tuv;Nuttha Lurponglukana-Strand
Affiliations:
Intel, Chandler;Industrial and Systems Engineering, Arizona State University, Tempe;Intel, Chandler;Industrial and Systems Engineering, Arizona State University, Tempe
Venue:
IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Year:
2009

Citing 4
Cited 0

Zero-inflated Poisson regression, with an application to defects in manufacturing

Technometrics
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Multivariate zero-inflated Poisson models and their applications

Technometrics
Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two linked ensembles are used for a supervised learning problem with rare-event counts. With many target instances of zero, more traditional loss functions (such as squared error and class error) are often not relevant and a statistical model leads to a likelihood with two related parameters from a zero-inflated Poisson (ZIP) distribution. In a new approach, a linked pair of gradient boosted tree ensembles are developed to handle the multiple parameters in a manner that can be generalized to other problems. The result is a unique learner that extends machine learning methods to data with nontraditional structures. We empirically compare to two real data sets and two artificial data sets versus a single-tree approach (ZIP-tree) and a statistical generalized linear model.