Incremental word learning: Efficient HMM initialization and large margin discriminative adaptation

Authors:
Irene AyllóN Clemente;Martin Heckmann;Britta Wrede
Affiliations:
Bielefeld University, Research Institute for Cognition and Robotics - CoR-Lab, D-33615 Bielefeld, Germany and Honda Research Institute Europe GmbH, D-63073 Offenbach am Main, Germany;Honda Research Institute Europe GmbH, D-63073 Offenbach am Main, Germany;Bielefeld University, Research Institute for Cognition and Robotics - CoR-Lab, D-33615 Bielefeld, Germany
Venue:
Speech Communication
Year:
2012

Citing 21
Cited 0

Environmental adaptation for robust speech recognition

Environmental adaptation for robust speech recognition
Controlling the complexity of HMM systems by regularization

Proceedings of the 1998 conference on Advances in neural information processing systems II
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Temporal classification: extending the classification paradigm to multivariate time series

Temporal classification: extending the classification paradigm to multivariate time series
Hidden Markov models for automatic annotation and content-based retrieval of images and video

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition

IEICE - Transactions on Information and Systems
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Spoken language processing: Piecing together the puzzle

Speech Communication
Initialization of hidden Markov models for unconstrained on-line handwriting recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 06
Incremental enrolment of speech recognizers

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Using a large vocabulary continuous speech recognizer for a constrained domain with limited training

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Markov Models for Pattern Recognition: From Theory to Applications

Markov Models for Pattern Recognition: From Theory to Applications
Large-margin minimum classification error training: A theoretical risk minimization perspective

Computer Speech and Language
Language Acquisition: The Emergence of Words from Multimodal Input

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
A Computational Model of Language Acquisition: the Emergence of Words

Fundamenta Informaticae - Cognitive Informatics, Cognitive Computing, and Their Denotational Mathematical Foundations (I)
ACORNS - towards computational modeling of communication and recognition skills

COGINF '07 Proceedings of the 6th IEEE International Conference on Cognitive Informatics
Unsupervised learning of time-frequency patches as a noise-robust representation of speech

Speech Communication
Robots that learn language: developmental approach to human-machine conversations

EELC'06 Proceedings of the Third international conference on Emergence and Evolution of Linguistic Communication: symbol Grounding and Beyond
Minimum phone error training of precision matrix models

IEEE Transactions on Audio, Speech, and Language Processing
Grounded spoken language acquisition: experiments in word learning

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present an incremental word learning system that is able to cope with few training data samples to enable speech acquisition in on-line human robot interaction. As with most automatic speech recognition systems (ASR), our architecture relies on a Hidden Markov Model (HMM) framework where the different word models are sequentially trained and the system has little prior knowledge. To achieve good performance, HMMs depends on the amount of training data, the initialization procedure and the efficiency of the discriminative training algorithms. Thus, we propose different approaches to improve the system. One major problem of using a small amount of training data is over-fitting. Hence we present a novel estimation of the variance floor dependent on the number of available training samples. Next, we propose a bootstrapping approach in order to get a good initialization of the HMM parameters. This method is based on unsupervised training of the parameters and subsequent construction of a new HMM by aligning and merging Viterbi decoded sequences. Finally, we investigate large margin discriminative training techniques to enlarge the generalization performance of the models using several strategies suitable for limited training data. In the evaluation of the results, we examine the contribution of the different stages proposed to the overall system performance. This includes the comparison of different state-of-the-art methods with our presented techniques and the investigation of the possible reduction of the number of training data samples. We compare our algorithms on isolated and continuous digit recognition tasks. To sum up, we show that the proposed algorithms yield significant improvements and are a step towards efficient learning with few examples.