An improved two-stage mixed language model approach for handling out-of-vocabulary words in large vocabulary continuous speech recognition

Authors:
Bert Réveil;Kris Demuynck;Jean-Pierre Martens
Affiliations:
-;-;-
Venue:
Computer Speech and Language
Year:
2014

Citing 5
Cited 0

An efficient search space representation for large vocabulary continuous speech recognition

Speech Communication
Modelling unknown words in spontaneous speech

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Contextual information improves OOV detection in speech

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improving proper name recognition by means of automatically learned pronunciation variants

Speech Communication
Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a two-stage mixed language model technique for detecting and recognizing words that are not included in the vocabulary of a large vocabulary continuous speech recognition system. The main idea is to spot the out-of-vocabulary words and to produce a transcription for these words in terms of subword units with the help of a mixed word/subword language model in the first stage, and to convert the subword transcriptions to word hypotheses by means of a look-up table in the second stage. The performance of the proposed approach is compared to that of the state-of-the-art hybrid method reported in the literature, both on in-domain and on out-of-domain Dutch spoken material, where the term 'domain' refers to the ensemble of topics that were covered in the material from which the lexicon and language model were retrieved. It turns out that the proposed approach is at least equally effective as a hybrid approach when it comes to recognizing in-domain material, and significantly more effective when applied to out-of-domain data. This proves that the proposed approach is easily adaptable to new domains and to new words (e.g. proper names) in the same domain. On the out-of-domain recognition task, the word error rate could be reduced by 12% relative over a baseline system incorporating a 100k word vocabulary and a basic garbage OOV word model.