Unsupervised speaker adaptation for telephone call transcription

  • Authors:
  • R. Wallace;K. Thambiratnam;F. Seide

  • Affiliations:
  • Speech and Audio Research Laboratory, Queensland University of Technology, 2 George Street, Brisbane, Australia;Microsoft Research Asia, 5F Sigma Center, 49 Zhi Chun Road, Beijing, China 100080;Microsoft Research Asia, 5F Sigma Center, 49 Zhi Chun Road, Beijing, China 100080

  • Venue:
  • ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of the PC and Internet for placing telephone calls will present new opportunities to capture vast amounts of un-transcribed speech for a particular speaker. This paper investigates how to best exploit this data for speaker-dependent speech recognition. Supervised and unsupervised experiments in acoustic model and language model adaptation are presented. Using one hour of automatically transcribed speech per speaker with a word error rate of 36.0%, unsupervised adaptation resulted in an absolute gain of 6.3%, equivalent to 70% of the gain from the supervised case, with additional adaptation data likely to yield further improvements. LM adaptation experiments suggested that although there seems to be a small degree of speaker idiolect, adaptation to the speaker alone, without considering the topic of the conversation, is in itself unlikely to improve transcription accuracy.