Unsupervised vocabulary adaptation for morph-based language models

Authors:
André Mansikkaniemi;Mikko Kurimo
Affiliations:
Aalto University School of Science, Aalto, Finland;Aalto University School of Science, Aalto, Finland
Venue:
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Year:
2012

Citing 4
Cited 0

Detection of foreign words and names in written text

Detection of foreign words and names in written text
Joint-sequence models for grapheme-to-phoneme conversion

Speech Communication
Importance of High-Order N-Gram Models in Morph-Based Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
On Growing and Pruning Kneser–Ney Smoothed -Gram Models

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modeling of foreign entity names is an important unsolved problem in morpheme-based modeling that is common in morphologically rich languages. In this paper we present an unsupervised vocabulary adaptation method for morph-based speech recognition. Foreign word candidates are detected automatically from in-domain text through the use of letter n-gram perplexity. Over-segmented foreign entity names are restored to their base forms in the morph-segmented in-domain text for easier and more reliable modeling and recognition. The adapted pronunciation rules are finally generated with a trainable grapheme-to-phoneme converter. In ASR performance the unsupervised method almost matches the ability of supervised adaptation in correctly recognizing foreign entity names.