Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Pattern recognition: statistical, structural and neural approaches
Pattern recognition: statistical, structural and neural approaches
Accelerating backpropagation through dynamic self-adaptation
Neural Networks
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Accelerated Backpropagation Learning: Extended Dynamic Parallel Tangent Optimization Algorithm
AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Hi-index | 0.00 |
In gradient based learning algorithms, the momentum has usually an improving effect in convergence rate and decreasing the zigzagging phenomena. However it sometimes causes the convergence rate to decrease. The Parallel Tangent (ParTan) gradient is used as deflecting method to improve the convergence. From the implementation point of view, it is as simple as the momentum. In fact this method is one of the more practical implementation of conjugate gradient. ParTan tries to overcome the inefficiency of zigzagging of conventional backpropagation by deflecting the gradient through acceleration phase. In this paper, we use two learning rate, η for gradient search direction and µ for accelerating direction through parallel tangent. Moreover, an improved version of dynamic self adaptation of η and η is used to improve parallel tangent gradient learning method. In dynamic self adaptation, each learning rate is adapted locally to the cost function landscape and the previous learning rate. Finally we test the proposed algorithm on various MLP neural networks including a XOR 2×2×1, Encoder 16×4×16 and finally Parity 4×4×1. We compare the results with those of the dynamic self adaptation of gradient learning rate and momentum (DSη-α) and parallel tangent with dynamic self adaptation (PTDSηµ). Experimental results showed that the average number of epoch is decreased to around 66% and 50% for DSη-α and PTDSη-µ respectively. Moreover our proposed algorithm shows a good power for avoiding from local minimum.