Training multilayer perceptrons with the extended Kalman algorithm
Advances in neural information processing systems 1
Fast exact multiplication by the Hessian
Neural Computation
Additive versus exponentiated gradient updates for linear prediction
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Natural gradient works efficiently in learning
Neural Computation
A fast, compact approximation of the exponential function
Neural Computation
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Fast Second-Order Gradient Descent via O(n) Curvature Matrix-Vector Products
Fast Second-Order Gradient Descent via O(n) Curvature Matrix-Vector Products
Hi-index | 0.00 |
The Gauss-Newton approximation of the Hessian guarantees positive semi-definiteness while retaining more second-order information than the Fisher information.We extend it from nonlinear least squares to all differentiable objectives such that positive semi-definiteness is maintained for the standard loss functions in neural network regression and classification. We give efficient algorithms for computing the product of extended Gauss-Newton and Fisher information matrices with arbitrary vectors, using techniques similar to but even cheaper than the fast Hessian-vector product [1]. The stability of SMD [2,3,4,5], a learning rate adaptation method that uses curvature matrix-vector products, improves when the extended Gauss-Newton matrix is substituted for the Hessian.