How dependencies between successive examples affect on-line learning
Neural Computation
Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Natural gradient works efficiently in learning
Neural Computation
Assessing the importance of features for multi-layer perceptrons
Neural Networks
The efficiency and the robustness of natural gradient descent learning rule
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Pruning using parameter and neuronal metrics
Neural Computation
Second Order Derivatives for Network Pruning: Optimal Brain Surgeon
Advances in Neural Information Processing Systems 5, [NIPS Conference]
An iterative pruning algorithm for feedforward neural networks
IEEE Transactions on Neural Networks
Natural Gradient Learning in NLDA Networks
IWANN '01 Proceedings of the 6th International Work-Conference on Artificial and Natural Neural Networks: Connectionist Models of Neurons, Learning Processes and Artificial Intelligence-Part I
Learning Capability: Classical RBF Network vs. SVM with Gaussian Kernel
IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
Network Optimization through Learning and Pruning in Neuromanifold
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Natural learning in NLDA networks
Neural Networks
A gradient-based algorithm competitive with variational Bayesian EM for mixture of Gaussians
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Natural conjugate gradient training of multilayer perceptrons
ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Hi-index | 0.00 |
Several studies have shown that natural gradient descent for on-line learning is much more efficient than standard gradient descent. In this article, we derive natural gradients in a slightly different manner and discuss implications for batch-mode learning and pruning, linking them to existing algorithms such as Levenberg-Marquardt optimization and optimal brain surgeon. The Fisher matrix plays an important role in all these algorithms. The second half of the article discusses a layered approximation of the Fisher matrix specific to multilayered perceptrons. Using this approximation rather than the exact Fisher matrix, we arrive at much faster "natural" learning algorithms and more robust pruning procedures.