Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model

  • Authors:
  • Anoop Deoras;Tomáš Mikolov;Stefan Kombrink;Kenneth Church

  • Affiliations:
  • Microsoft Corporation, 1065 La Avenida. Mountain View, CA 94043, United States;Speech@FIT, Brno University of Technology, Brno, Czech Republic;Speech@FIT, Brno University of Technology, Brno, Czech Republic;IBM T.J. Watson Research Center, Yorktown Heights, NY, United States

  • Venue:
  • Speech Communication
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present strategies to incorporate long context information directly during the first pass decoding and also for the second pass lattice re-scoring in speech recognition systems. Long-span language models that capture complex syntactic and/or semantic information are seldom used in the first pass of large vocabulary continuous speech recognition systems due to the prohibitive increase in the size of the sentence-hypotheses search space. Typically, n-gram language models are used in the first pass to produce N-best lists, which are then re-scored using long-span models. Such a pipeline produces biased first pass output, resulting in sub-optimal performance during re-scoring. In this paper we show that computationally tractable variational approximations of the long-span and complex language models are a better choice than the standard n-gram model for the first pass decoding and also for lattice re-scoring.