Using prosodic features in language models for meetings

  • Authors:
  • Songfang Huang;Steve Renals

  • Affiliations:
  • The Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;The Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK

  • Venue:
  • MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Prosody has been actively studied as an important knowledge source for speech recognition and understanding. In this paper, we are concerned with the question of exploiting prosody for language models to aid automatic speech recognition in the context of meetings. Using an automatic syllable detection algorithm, the syllable-based prosodic features are extracted to form the prosodic representation for each word. Two modeling approaches are then investigated. One is based on a factored language model, which directly uses the prosodic representation and treats it as a 'word'. Instead of direct association, the second approach provides a richer probabilistic structure within a hierarchical Bayesian framework by introducing an intermediate latent variable to represent similar prosodic patterns shared by groups of words. Fourfold cross-validation experiments on the ICSI Meeting Corpus show that exploiting prosody for language modeling can significantly reduce the perplexity, and also have marginal reductions in word error rate.