Geometric optimization methods for adaptive filtering
Geometric optimization methods for adaptive filtering
Natural gradient works efficiently in learning
Neural Computation
Ensemble learning for multi-layer networks
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
The Geometry of Algorithms with Orthogonality Constraints
SIAM Journal on Matrix Analysis and Applications
Hierarchical models of variance sources
Signal Processing - Special issue on independent components analysis and beyond
Online Model Selection Based on the Variational Bayes
Neural Computation
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Natural Conjugate Gradient on Complex Flag Manifolds for Complex Independent Subspace Analysis
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Bidirectional relation between CMA evolution strategies and natural evolution strategies
PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Approximate Riemannian Conjugate Gradient Learning for Fixed-Form Variational Bayes
The Journal of Machine Learning Research
Face Recognition System using Discrete Cosine Transform combined with MLP and RBF Neural Networks
International Journal of Mobile Computing and Multimedia Communications
Stochastic variational inference
The Journal of Machine Learning Research
Hi-index | 0.00 |
Variational methods for approximate inference in machine learning often adapt a parametric probability distribution to optimize a given objective function. This view is especially useful when applying variational Bayes (VB) to models outside the conjugate-exponential family. For them, variational Bayesian expectation maximization (VB EM) algorithms are not easily available, and gradient-based methods are often used as alternatives. Traditional natural gradient methods use the Riemannian structure (or geometry) of the predictive distribution to speed up maximum likelihood estimation. We propose using the geometry of the variational approximating distribution instead to speed up a conjugate gradient method for variational learning and inference. The computational overhead is small due to the simplicity of the approximating distribution. Experiments with real-world speech data show significant speedups over alternative learning algorithms.