Novel maximum-margin training algorithms for supervised neural networks

Authors:
Oswaldo Ludwig;Urbano Nunes
Affiliations:
Institute of Systems and Robotics, Department of Electrical and Computer Engineering, University of Coimbra Polo II, Coimbra, Portugal;Institute of Systems and Robotics, Department of Electrical and Computer Engineering, University of Coimbra Polo II, Coimbra, Portugal
Venue:
IEEE Transactions on Neural Networks
Year:
2010

Citing 18
Cited 7

Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks

Neural Networks
The nature of statistical learning theory

The nature of statistical learning theory
Soft Margins for AdaBoost

Machine Learning
Sparseness of support vector machines

The Journal of Machine Learning Research
Gabor-Based Kernel PCA with Fractional Power Polynomial Models for Face Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
KPCA Plus LDA: A Complete Kernel Fisher Discriminant Framework for Feature Extraction and Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Core Vector Machines: Fast SVM Training on Very Large Data Sets

The Journal of Machine Learning Research
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Optimization of ann applied to non-linear system identification

MIC'06 Proceedings of the 25th IASTED international conference on Modeling, indentification, and control
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)

Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Multi-class pattern classification using neural networks

Pattern Recognition
Maximum margin clustering made practical

IEEE Transactions on Neural Networks
Fast obstacle detection for urban traffic situations

IEEE Transactions on Intelligent Transportation Systems
CARVE-a constructive algorithm for real-valued examples

IEEE Transactions on Neural Networks
Exploring constructive cascade networks

IEEE Transactions on Neural Networks
Nonlinear kernel-based statistical pattern analysis

IEEE Transactions on Neural Networks
Training Two-Layered Feedforward Networks With Variable Projection Method

IEEE Transactions on Neural Networks
Beyond Feedforward Models Trained by Backpropagation: A Practical Training Tool for a More Efficient Universal Approximator

IEEE Transactions on Neural Networks

Direct search as unsupervised training algorithm for neural networks

ICS'10 Proceedings of the 14th WSEAS international conference on Systems: part of the 14th WSEAS CSCC multiconference - Volume II
Learning optimal spatial filters by discriminant analysis for brain-computer-interface

Neurocomputing
Neuro-levelset system based segmentation in dynamic susceptibility contrast enhanced and diffusion weighted magnetic resonance images

Pattern Recognition
Generalized dual Hahn moment invariants

Pattern Recognition
Predicting transmission of avian influenza A viruses from avian to human by using informative physicochemical properties

International Journal of Data Mining and Bioinformatics
Eigenvalue decay: A new method for neural network regularization

Neurocomputing
Theoretical aspects of mapping to multidimensional optimal regions as a multi-classifier

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes three novel training methods, two of them based on the backpropagation approach and a third one based on information theory for multilayer perceptron (MLP) binary classifiers. Both backpropagation methods are based on the maximal-margin (MM) principle. The first one, based on the gradient descent with adaptive learning rate algorithm (GDX) and named maximum-margin GDX (MMGDX), directly increases the margin of the MLP output-layer hyperplane. The proposed method jointly optimizes both MLP layers in a single process, backpropagating the gradient of an MM-based objective function, through the output and hidden layers, in order to create a hidden-layer space that enables a higher margin for the output-layer hyperplane, avoiding the testing of many arbitrary kernels, as occurs in case of support vector machine (SVM) training. The proposed MM-based objective function aims to stretch out the margin to its limit. An objective function based on Lp-norm is also proposed in order to take into account the idea of support vectors, however, overcoming the complexity involved in solving a constrained optimization problem, usually in SVM training. In fact, all the training methods proposed in this paper have time and space complexities O(N) while usual SVM training methods have time complexity O(N3) and space complexity O(N2) where is the training-data-set size. The second approach, named minimization of interclass interference (MICI), has an objective function inspired on the Fisher discriminant analysis. Such algorithm aims to create an MLP hidden output where the patterns have a desirable statistical distribution. In both training methods, the maximum area under ROC curve (AUC) is applied as stop criterion. The third approach offers a robust training framework able to take the best of each proposed training method. The main idea is to compose a neural model by using neurons extracted from three other neural networks, each one previously trained by MICI, MMGDX, and Levenberg-Marquard (LM), respectively. The resulting neural network was named assembled neural network (ASNN). Benchmark data sets of real-world problems have been used in experiments that enable a comparison with other state-of-the-art classifiers. The results provide evidence of the effectiveness of our methods regarding accuracy, AUC, and balanced error rate.