Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures

Authors:
Ivan Bulyko;Mari Ostendorf;Andreas Stolcke
Affiliations:
University of Washington, Seattle, WA;University of Washington, Seattle, WA;SRI International, CA
Venue:
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Year:
2003

Citing 3
Cited 20

Language Model Adaptation Using Mixtures and an Exponentially Decaying Cache

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
The meeting project at ICSI

HLT '01 Proceedings of the first international conference on Human language technology research
Improved topic-dependent language modeling using information retrieval techniques

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Web-based models for natural language processing

ACM Transactions on Speech and Language Processing (TSLP)
Continuous space language models

Computer Speech and Language
Web resources for language modeling in conversational speech recognition

ACM Transactions on Speech and Language Processing (TSLP)
Rapid bootstrapping of statistical spoken dialogue systems

Speech Communication
Speech Processing for Audio Indexing

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Simultaneous translation of lectures and speeches

Machine Translation
Zero-Anaphora Resolution in Chinese Using Maximum Entropy

IEICE - Transactions on Information and Systems
Web augmentation of language models for continuous speech recognition of SMS text messages

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
POS tagging of dialectal Arabic: a minimally supervised approach

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Cheap, fast and good enough: automatic speech recognition with non-expert transcription

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The development of the AMI system for the transcription of speech in meetings

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
The 2005 AMI system for the transcription of speech in meetings

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Further progress in meeting recognition: the ICSI-SRI spring 2005 speech-to-text evaluation system

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
The 2004 ICSI-SRI-UW meeting recognition system

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Lexical choice via topic adaptation for paraphrasing written language to spoken language

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
The ISL RT-06S speech-to-text system

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
The AMI meeting transcription system: progress and performance

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Use of contexts in language model interpolation and adaptation

Computer Speech and Language
Revisiting the predictability of language: response completion in social media

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Language model cross adaptation for LVCSR system combination

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.