Using prosodic features in language models for meetings

Authors:
Songfang Huang;Steve Renals
Affiliations:
The Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;The Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK
Venue:
MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Year:
2007

Citing 8
Cited 0

Prosody-based automatic segmentation of speech into sentences and topics

Speech Communication - Special issue on accessing information in spoken audio
Automatic syllable detection for vowel landmarks

Automatic syllable detection for vowel landmarks
Latent dirichlet allocation

The Journal of Machine Learning Research
The author-topic model for authors and documents

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Factored language models and generalized parallel backoff

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Prosody/parse scoring and its application in ATIS

HLT '93 Proceedings of the workshop on Human Language Technology
A hierarchical Bayesian language model based on Pitman-Yor processes

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The development of the AMI system for the transcription of speech in meetings

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Prosody has been actively studied as an important knowledge source for speech recognition and understanding. In this paper, we are concerned with the question of exploiting prosody for language models to aid automatic speech recognition in the context of meetings. Using an automatic syllable detection algorithm, the syllable-based prosodic features are extracted to form the prosodic representation for each word. Two modeling approaches are then investigated. One is based on a factored language model, which directly uses the prosodic representation and treats it as a 'word'. Instead of direct association, the second approach provides a richer probabilistic structure within a hierarchical Bayesian framework by introducing an intermediate latent variable to represent similar prosodic patterns shared by groups of words. Fourfold cross-validation experiments on the ICSI Meeting Corpus show that exploiting prosody for language modeling can significantly reduce the perplexity, and also have marginal reductions in word error rate.