Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures

  • Authors:
  • Ivan Bulyko;Mari Ostendorf;Andreas Stolcke

  • Affiliations:
  • University of Washington, Seattle, WA;University of Washington, Seattle, WA;SRI International, CA

  • Venue:
  • NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.