Data perturbation for escaping local maxima in learning

Authors:
Gal Elidan;Matan Ninio;Nir Friedman;Dale Schuurmans
Affiliations:
Hebrew University;Hebrew University;Hebrew University;University of Waterloo
Venue:
Eighteenth national conference on Artificial intelligence
Year:
2002

Citing 14
Cited 13

Simulated annealing: theory and applications

Simulated annealing: theory and applications
Tabu search

Modern heuristic techniques for combinatorial problems
The EM algorithm for graphical association models with missing data

Computational Statistics & Data Analysis - Special issue dedicated to Toma´sˇ Havra´nek
Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

Machine Learning
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
A tutorial on learning with Bayesian networks

Learning in graphical models
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
From promoter sequence to expression: a probabilistic framework

Proceedings of the sixth annual international conference on Computational biology
Learning Belief Networks in the Presence of Missing Values and Hidden Variables

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The exponentiated subgradient algorithm for heuristic Boolean programming

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Discovering the hidden structure of complex dynamic systems

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
The Bayesian structural EM algorithm

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Learning equivalence classes of Bayesian network structures

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence

Learning Hidden Variable Networks: The Information Bottleneck Approach

The Journal of Machine Learning Research
Tuning evaluation functions by maximizing concordance

Theoretical Computer Science - Advances in computer games
Estimation of individual prediction reliability using the local sensitivity analysis

Applied Intelligence
Comparison of approaches for estimating reliability of individual regression predictions

Data & Knowledge Engineering
An overview of advances in reliability estimation of individual predictions in machine learning

Intelligent Data Analysis
Learning Bayesian network parameters under incomplete data with domain knowledge

Pattern Recognition
Latent tree models and approximate inference in Bayesian networks

Journal of Artificial Intelligence Research
Skewing: an efficient alternative to lookahead for decision tree induction

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Learning mixture models via component-wise parameter smoothing

Computational Statistics & Data Analysis
Review:

The Knowledge Engineering Review
The Information bottleneck EM algorithm

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Review: learning bayesian networks: Approaches and issues

The Knowledge Engineering Review
Considering unknown unknowns - reconstruction of non-confoundable causal relations in biological networks

RECOMB'13 Proceedings of the 17th international conference on Research in Computational Molecular Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Almost all machine learning algorithms--be they for regression, classification or density estimation--seek hypotheses that optimize a score on training data. In most interesting cases, however, full global optimization is not feasible and local search techniques are used to discover reasonable solutions. Unfortunately, the quality of the local maxima reached depends on initialization and is often weaker than the global maximum. In this paper, we present a simple approach for combining global search with local optimization to discover improved hypotheses in general machine learning problems. The main idea is to escape local maxima by perturbing the training data to create plausible new ascent directions, rather than perturbing hypotheses directly. Specifically, we consider example-reweighting strategies that are reminiscent of boosting and other ensemble learning methods, but applied in a different way with a different goal: to produce a single hypothesis that achieves a good score on training and test data. To evaluate the performance of our algorithms we consider a number of problems in learning Bayesian networks from data, including discrete training problems (structure search), continuous training problems (parametric EM, non-linear logistic regression), and mixed training problems (Structural EM)- on both synthetic and real-world data. In each case, we obtain state of the art performance on both training and test data.