Neural Computation
Fast curvature matrix-vector products for second-order gradient descent
Neural Computation
Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Conjugate Directions for Stochastic Gradient Descent
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Fast Curvature Matrix-Vector Products
ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Global Feedforward Neural Network Learning for Classification and Regression
EMMCVPR '01 Proceedings of the Third International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition
Dual extended Kalman filtering in recurrent neural networks
Neural Networks
Neural Networks - 2003 Special issue: Advances in neural networks research IJCNN'03
Manifold Stochastic Dynamics for Bayesian Learning
Neural Computation
Accelerated training of conditional random fields with stochastic gradient methods
ICML '06 Proceedings of the 23rd international conference on Machine learning
Fast kernel entropy estimation and optimization
Signal Processing - Special issue: Information theoretic signal processing
Fast stochastic optimization for articulated structure tracking
Image and Vision Computing
Step Size Adaptation in Reproducing Kernel Hilbert Space
The Journal of Machine Learning Research
Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator
ACM Transactions on Programming Languages and Systems (TOPLAS)
Deterministic neural classification
Neural Computation
Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods
The Journal of Machine Learning Research
Efficient Weight Learning for Markov Logic Networks
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Bio-inspired and gradient-based algorithms to train MLPs: The influence of diversity
Information Sciences: an International Journal
Probabilistic inductive logic programming
λ-Perceptron: An adaptive classifier for data streams
Pattern Recognition
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Book reviews: Application of neural networks to adaptive control of nonlinear systems
Automatica (Journal of IFAC)
Training energy-based models for time-series imputation
The Journal of Machine Learning Research
Hi-index | 0.00 |
Just storing the Hessian H (the matrix of second derivativesδ2E/δwiδwj of the error E with respect to eachpair of weights) of a large neural network is difficult. Since acommon use of a large matrix like H is to compute its product withvarious vectors, we derive a technique that directly calculates Hv,where v is an arbitrary vector. To calculate Hv, we first define adifferential operator Rv{f(w)} =(δ/δr)f(w + rv)|r=0, note thatRv{∇w} = Hv and Rv{w} =v, and then apply Rv{·} to the equationsused to compute ∇w. The result is an exact andnumerically stable procedure for computing Hv, which takes about asmuch computation, and is about as local, as a gradient evaluation.We then apply the technique to a one pass gradient calculationalgorithm (backpropagation), a relaxation gradient calculationalgorithm (recurrent backpropagation), and two stochastic gradientcalculation algorithms (Boltzmann machines and weightperturbation). Finally, we show that this technique can be used atthe heart of many iterative techniques for computing variousproperties of H, obviating any need to calculate the fullHessian.