Robust ASR using Support Vector Machines

Authors:
R. Solera-Ureña;D. Martín-Iglesias;A. Gallardo-Antolín;C. Peláez-Moreno;F. Díaz-de-María
Affiliations:
Signal Theory and Communications Department, EPS-Universidad Carlos III de Madrid, Avda. Universidad, 30, Leganés 28911, Spain;Signal Theory and Communications Department, EPS-Universidad Carlos III de Madrid, Avda. Universidad, 30, Leganés 28911, Spain;Signal Theory and Communications Department, EPS-Universidad Carlos III de Madrid, Avda. Universidad, 30, Leganés 28911, Spain;Signal Theory and Communications Department, EPS-Universidad Carlos III de Madrid, Avda. Universidad, 30, Leganés 28911, Spain;Signal Theory and Communications Department, EPS-Universidad Carlos III de Madrid, Avda. Universidad, 30, Leganés 28911, Spain
Venue:
Speech Communication
Year:
2007

Citing 16
Cited 6

The nature of statistical learning theory

The nature of statistical learning theory
Making large-scale support vector machine learning practical

Advances in kernel methods
Connectionist Speech Recognition: A Hybrid Approach

Connectionist Speech Recognition: A Hybrid Approach
Reducing multiclass to binary: a unifying approach for margin classifiers

The Journal of Machine Learning Research
Round robin classification

The Journal of Machine Learning Research
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Support vector machines for speech recognition

Support vector machines for speech recognition
Probability Estimates for Multi-class Classification by Pairwise Coupling

The Journal of Machine Learning Research
Continuous speech recognition using linked predictive neural networks

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
On the use of support vector machines for phonetic classification

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Client dependent GMM-SVM models for speaker verification

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
A speech recognizer based on multiclass SVMs with HMM-Guided segmentation

NOLISP'05 Proceedings of the 3rd international conference on Non-Linear Analyses and Algorithms for Speech Processing
Applications of support vector machines to speech recognition

IEEE Transactions on Signal Processing
Large margin hidden Markov models for speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Weighted least squares training of support vector classifiers leading to compact and adaptive schemes

IEEE Transactions on Neural Networks
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Invited paper: Automatic speech recognition: History, methods and challenges

Pattern Recognition
SVMs for automatic speech recognition: a survey

Progress in nonlinear speech processing
Hybrid models for automatic speech recognition: a comparison of classical ANN and kernel based methods

NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Single-class support vector machine for an out-of-vocabulary rejection of isolated words

ROBIO'09 Proceedings of the 2009 international conference on Robotics and biomimetics
Control of discrete chaotic systems based on echo state network modeling with an adaptive noise canceler

Knowledge-Based Systems
Automatic speech recognition for under-resourced languages: A survey

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units. In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM-SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841-1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.