Looking for lumps: boosting and bagging for density estimation

  • Authors:
  • Greg Ridgeway

  • Affiliations:
  • RAND Corporation, RAND Statistics Group, 1700 N. Main Street, Santa Monica, CA

  • Venue:
  • Computational Statistics & Data Analysis - Nonlinear methods and data mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The solution to data mining problems often involves discovering non-linear relationships in large, noisy datasets. Bagging, boosting, and their variations have produced an interesting new class of techniques for finding these relationships in prediction problems. In this paper I extend these methods to the design of algorithms for density estimation for large, noisy, high dimensional datasets. Analogous to the boosting framework, the algorithms iteratively mix the current density estimator with an additional density chosen in a greedy fashion to optimize a fit criterion. A bagging step helps to control overfitting by providing better estimates of the fit criterion. I derive optimization algorithms for the boosting steps, discuss strategies for massive datasets, and show results from real and simulated problems.