Robust and Efficient Pattern Classification using Large Geometric Margin Minimum Classification Error Training

Authors:
Hideyuki Watanabe;Tsukasa Ohashi;Shigeru Katagiri;Miho Ohsaki;Shigeki Matsuda;Hideki Kashioka
Affiliations:
National Institute of Information and Communications Technology, Kyoto, Japan 619-0289;Graduate School of Engineering, Doshisha University, Kyoto, Japan 610-0394;Graduate School of Engineering, Doshisha University, Kyoto, Japan 610-0394;Graduate School of Engineering, Doshisha University, Kyoto, Japan 610-0394;National Institute of Information and Communications Technology, Kyoto, Japan 619-0289;National Institute of Information and Communications Technology, Kyoto, Japan 619-0289
Venue:
Journal of Signal Processing Systems
Year:
2014

Citing 16
Cited 0

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Training with noise is equivalent to Tikhonov regularization

Neural Computation
The nature of statistical learning theory

The nature of statistical learning theory
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Comparison of discriminative training criteria and optimization methods for speech recognition

Speech Communication
A Formulation of Learning Vector Quantization Using a New Misclassification Measure

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 1 - Volume 1
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Machine Learning: Discriminative and Generative (Kluwer International Series in Engineering and Computer Science)

Machine Learning: Discriminative and Generative (Kluwer International Series in Engineering and Computer Science)
Support Vector Machines for Pattern Classification (Advances in Pattern Recognition)

Support Vector Machines for Pattern Classification (Advances in Pattern Recognition)
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Large-margin minimum classification error training: A theoretical risk minimization perspective

Computer Speech and Language
Discriminative training of HMMs for automatic speech recognition: A survey

Computer Speech and Language
Discriminative learning for minimum error classification [patternrecognition]

IEEE Transactions on Signal Processing
Large margin hidden Markov models for speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Approximate Test Risk Bound Minimization Through Soft Margin Estimation

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, one of the standard discriminative training methods for pattern classifier design, i.e., Minimum Classification Error (MCE) training, has been revised, and its new version is called Large Geometric Margin Minimum Classification Error (LGM-MCE) training. It is formulated by replacing a conventional misclassification measure, which is equivalent to the so-called functional margin, with a geometric margin that represents the geometric distance between an estimated class boundary and its closest training pattern sample. It seeks the status of the trainable classifier parameters that simultaneously correspond to the minimum of the empirical average classification error count loss and the maximum of the geometric margin. Experimental evaluations showed the fundamental utility of LGM-MCE training. However, to increase its effectiveness, this new training required careful setting for hyperparameters, especially the smoothness degree of the smooth classification error count loss. Exploring the smoothness degree usually requires many trial-and-error repetitions of training and testing, and such burdensome repetition does not necessarily lead to an optimal smoothness setting. To alleviate this problem and further increase the effect of geometric margin employment, we apply in this paper a new idea that automatically determines the loss smoothness of LGM-MCE training. We first introduce a new formalization of it using the Parzen estimation of error count risk and formalize LGM-MCE training that incorporates a mechanism of automatic loss smoothness determination. Importantly, the geometric-margin-based misclassification measure adopted in LGM-MCE training is directly linked with the geometric margin in a pattern sample space. Based on this relation, we also prove that loss smoothness affects the production of virtual samples along the estimated class boundaries in pattern sample space. Finally, through experimental evaluations and in comparisons with other training methods, we elaborate the characteristics of LGM-MCE training and its new function that automatically determines an appropriate loss smoothness degree.