Looking for lumps: boosting and bagging for density estimation

Authors:
Greg Ridgeway
Affiliations:
RAND Corporation, RAND Statistics Group, 1700 N. Main Street, Santa Monica, CA
Venue:
Computational Statistics & Data Analysis - Nonlinear methods and data mining
Year:
2002

Citing 6
Cited 6

Bagging predictors

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Squashing flat files flatter

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Prediction games and arcing algorithms

Neural Computation
Using Iterated Bagging to Debias Regressions

Machine Learning
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery

Density estimation with stagewise optimization of the empirical risk

Machine Learning
Probability Density Estimation by Perturbing and Combining Tree Structured Markov Networks

ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography

Computational Statistics & Data Analysis
Classification Based on Combination of Kernel Density Estimators

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Incremental aspect models for mining document streams

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Density estimation with minimization of U-divergence

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The solution to data mining problems often involves discovering non-linear relationships in large, noisy datasets. Bagging, boosting, and their variations have produced an interesting new class of techniques for finding these relationships in prediction problems. In this paper I extend these methods to the design of algorithms for density estimation for large, noisy, high dimensional datasets. Analogous to the boosting framework, the algorithms iteratively mix the current density estimator with an additional density chosen in a greedy fashion to optimize a fit criterion. A bagging step helps to control overfitting by providing better estimates of the fit criterion. I derive optimization algorithms for the boosting steps, discuss strategies for massive datasets, and show results from real and simulated problems.