Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Prediction games and arcing algorithms
Neural Computation
Using Iterated Bagging to Debias Regressions
Machine Learning
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality
Data Mining and Knowledge Discovery
Density estimation with stagewise optimization of the empirical risk
Machine Learning
Probability Density Estimation by Perturbing and Combining Tree Structured Markov Networks
ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Computational Statistics & Data Analysis
Classification Based on Combination of Kernel Density Estimators
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Incremental aspect models for mining document streams
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Density estimation with minimization of U-divergence
Machine Learning
Hi-index | 0.00 |
The solution to data mining problems often involves discovering non-linear relationships in large, noisy datasets. Bagging, boosting, and their variations have produced an interesting new class of techniques for finding these relationships in prediction problems. In this paper I extend these methods to the design of algorithms for density estimation for large, noisy, high dimensional datasets. Analogous to the boosting framework, the algorithms iteratively mix the current density estimator with an additional density chosen in a greedy fashion to optimize a fit criterion. A bagging step helps to control overfitting by providing better estimates of the fit criterion. I derive optimization algorithms for the boosting steps, discuss strategies for massive datasets, and show results from real and simulated problems.