Style & topic language model adaptation using HMM-LDA

Authors:
Bo-June (Paul) Hsu;James Glass
Affiliations:
MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
Venue:
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Year:
2006

Citing 6
Cited 10

A Cache-Based Natural Language Model for Speech Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Language Model Adaptation Using Mixtures and an Exponentially Decaying Cache

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Latent dirichlet allocation

The Journal of Machine Learning Research
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
The author-topic model for authors and documents

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Analysis and processing of lecture audio data: preliminary investigations

SpeechIR '04 Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004

Modeling Topic and Role Information in Meetings Using the Hierarchical Dirichlet Process

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Bilingual LSA-based adaptation for statistical machine translation

Machine Translation
Using LDA to detect semantically incoherent documents

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
N-gram weighting: reducing training data mismatch in cross-domain language model estimation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Improving automatic speech recognition for lectures through transformation-based rules learned from minimal data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Topic tracking language model for speech recognition

Computer Speech and Language
Imposing hierarchical browsing structures onto spoken documents

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A normalized-cut alignment model for mapping hierarchical semantic structures onto spoken documents

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
A two-stage domain selection framework for extensible multi-domain spoken dialogue systems

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Topic adaptation for lecture translation through bilingual latent semantic models

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Adapting language models across styles and topics, such as for lecture transcription, involves combining generic style models with topic-specific content relevant to the target document. In this work, we investigate the use of the Hidden Markov Model with Latent Dirichlet Allocation (HMM-LDA) to obtain syntactic state and semantic topic assignments to word instances in the training corpus. From these context-dependent labels, we construct style and topic models that better model the target document, and extend the traditional bag-of-words topic models to n-grams. Experiments with static model interpolation yielded a perplexity and relative word error rate (WER) reduction of 7.1% and 2.1%, respectively, over an adapted trigram baseline. Adaptive interpolation of mixture components further reduced perplexity by 9.5% and WER by a modest 0.3%.