Detection of foreign words and names in written text
Detection of foreign words and names in written text
Joint-sequence models for grapheme-to-phoneme conversion
Speech Communication
Importance of High-Order N-Gram Models in Morph-Based Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing
On Growing and Pruning Kneser–Ney Smoothed -Gram Models
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
Modeling of foreign entity names is an important unsolved problem in morpheme-based modeling that is common in morphologically rich languages. In this paper we present an unsupervised vocabulary adaptation method for morph-based speech recognition. Foreign word candidates are detected automatically from in-domain text through the use of letter n-gram perplexity. Over-segmented foreign entity names are restored to their base forms in the morph-segmented in-domain text for easier and more reliable modeling and recognition. The adapted pronunciation rules are finally generated with a trainable grapheme-to-phoneme converter. In ASR performance the unsupervised method almost matches the ability of supervised adaptation in correctly recognizing foreign entity names.