Combination of supervised and unsupervised learning for training the activation functions of neural networks

Authors:
Ilaria Castelli;Edmondo Trentin
Affiliations:
-;-
Venue:
Pattern Recognition Letters
Year:
2014

Citing 12
Cited 0

Introduction to the theory of neural computation

Introduction to the theory of neural computation
A feedforward neural network with function shape autotuning

Neural Networks
Learning and approximation capabilities of adaptive spline activation function neutral networks

Neural Networks
Networks with trainable amplitude of activation functions

Neural Networks
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Learning Deep Architectures for AI

Foundations and Trends® in Machine Learning
Diagnostic of pathology on the vertebral column with embedded reject option

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Supervised and unsupervised co-training of adaptive activation functions in neural nets

PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning
Semi-unsupervised weighted maximum-likelihood estimation of joint densities for the co-training of adaptive activation functions

PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning
Neural network controller using autotuning method for nonlinear functions

IEEE Transactions on Neural Networks
Learning long-term dependencies with gradient descent is difficult

IEEE Transactions on Neural Networks
Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.10

Visualization

Abstract

Standard feedforward neural networks benefit from the nice theoretical properties of mixtures of sigmoid activation functions, but they may fail in several practical learning tasks. These tasks would be better faced by relying on a more appropriate, problem-specific basis of activation functions. The paper presents a connectionist model which exploits adaptive activation functions. Each hidden unit in the network is associated with a specific pair (f(.),p(.)), where f(.) is the activation function and p(.) is the likelihood of the unit being relevant to the computation of the network output over the current input. The function f(.) is optimized in a supervised manner, while p(.) is realized via a statistical parametric model learned through unsupervised (or, partially supervised) estimation. Since f(.) and p(.) influence each other's learning process, the overall machine is implicitly a co-trained coupled model and, in turn, a flexible, non-standard neural architecture. Feasibility of the approach is corroborated by empirical evidence yielded by computer simulations involving regression and classification tasks.