Toward Robust Speech Recognition and Understanding

  • Authors:
  • Sadaoki Furui

  • Affiliations:
  • Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan 152-8552

  • Venue:
  • Journal of VLSI Signal Processing Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The principal cause of speech recognition errors is a mismatch between trained acoustic/language models and input speech due to the limited amount of training data in comparison with the vast variation of speech. It is crucial to establish methods that are robust against voice variation due to individuality, the physical and psychological condition of the speaker, telephone sets, microphones, network characteristics, additive background noise, speaking styles, and other aspects. This paper overviews robust architecture and modeling techniques for speech recognition and understanding. The topics include acoustic and language modeling for spontaneous speech recognition, unsupervised adaptation of acoustic and language models, robust architecture for spoken dialogue systems, multi-modal speech recognition, and speech summarization. This paper also discusses the most important research problems to be solved in order to achieve ultimate robust speech recognition and understanding systems.