Unsupervised vocabulary adaptation for morph-based language models

  • Authors:
  • André Mansikkaniemi;Mikko Kurimo

  • Affiliations:
  • Aalto University School of Science, Aalto, Finland;Aalto University School of Science, Aalto, Finland

  • Venue:
  • WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modeling of foreign entity names is an important unsolved problem in morpheme-based modeling that is common in morphologically rich languages. In this paper we present an unsupervised vocabulary adaptation method for morph-based speech recognition. Foreign word candidates are detected automatically from in-domain text through the use of letter n-gram perplexity. Over-segmented foreign entity names are restored to their base forms in the morph-segmented in-domain text for easier and more reliable modeling and recognition. The adapted pronunciation rules are finally generated with a trainable grapheme-to-phoneme converter. In ASR performance the unsupervised method almost matches the ability of supervised adaptation in correctly recognizing foreign entity names.