Active learning for acoustic speech recognition modeling

  • Authors:
  • Gerald G. Meyer;Teresa M. Kamm

  • Affiliations:
  • -;-

  • Venue:
  • Active learning for acoustic speech recognition modeling
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work, we investigate a machine learning approach to cost-effectively train acoustic models for speech recognition. More specifically, we utilize an active learning method that allows the system/learner to exert control over what new data is introduced into training, allowing us to selectively invest in the resources necessary to provide the truth labels required to train the models. We propose a two-pronged approach to improve speech recognition performance through the selective use of training data. First, we make effective use of the available transcribed data by selectively using only those examples that are likely to improve system performance. And second, we focus future transcription effort on data that has the biggest potential to improve performance. Our approach has the capability to select a set of data from which to build a recognition system that outperforms a system built on larger, but randomly selected, data. We start our investigation of our proposed data-selective methods by using a simple alphadigit recognition problem. We demonstrate both a model-selective and a sequence-selective approach appropriate for situations when whole words are modeled independently of each other, in one case showing that we can improve system performance by selectively using 30% (17.5 hours) of a 50-hour training set. In particular, we reduce the error rate from 10.3% (achieved when using the full 50 hours to train) to 9.0%, without prior knowledge of the true transcription. Finally, we demonstrate, on the more realistic Wall Street Journal corpus, a word-selective method suitable for typical speech recognition applications, showing that selective algorithms are able to meet or beat the error rate predicted by systems trained with a similar amount of randomly selected training data.