A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Exploiting generative models in discriminative classifiers
Proceedings of the 1998 conference on Advances in neural information processing systems II
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Forward Decoding Kernel Machines: A Hybrid HMM/SVM Approach to Sequence Recognition
SVM '02 Proceedings of the First International Workshop on Pattern Recognition with Support Vector Machines
Minimum bayes-risk techniques in automatic speech recognition and statistical machine translation
Minimum bayes-risk techniques in automatic speech recognition and statistical machine translation
Design and implementation of ultra-low power pattern and sequence decoders
Design and implementation of ultra-low power pattern and sequence decoders
Discriminative training for speaker adaptation and minimum bayes risk estimation in large vocabulary speech recognition
Code breaking for automatic speech recognition
Code breaking for automatic speech recognition
A comparison of methods for multiclass support vector machines
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
We describe the use of support vector machines (SVMs) for continuous speech recognition by incorporating them in segmental minimum Bayes risk decoding. Lattice cutting is used to convert the Automatic Speech Recognition search space into sequences of smaller recognition problems. SVMs are then trained as discriminative models over each of these problems and used in a rescoring framework. We pose the estimation of a posterior distribution over hypotheses in these regions of acoustic confusion as a logistic regression problem. We also show that GiniSVMs can be used as an approximation technique to estimate the parameters of the logistic regression problem. On a small vocabulary recognition task we show that the use of GiniSVMs can improve the performance of a well trained hidden Markov model system trained under the Maximum Mutual Information criterion. We also find that it is possible to derive reliable confidence scores over the GiniSVM hypotheses and that these can be used to good effect in hypothesis combination. We discuss the problems that we expect to encounter in extending this approach to large vocabulary continuous speech recognition and describe initial investigation of constrained estimation techniques to derive feature spaces for SVMs.