A characterization of the problem of new, out-of-vocabulary words in continuous-speech recognition and understanding
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The LIMSI Broadcast News transcription system
Speech Communication - Special issue on automatic transcription of broadcast news data
Modelling out-of-vocabulary words for robust speech recognition
Modelling out-of-vocabulary words for robust speech recognition
AUDIMUS.MEDIA: a broadcast news speech recognition system for the european portuguese language
PROPOR'03 Proceedings of the 6th international conference on Computational processing of the Portuguese language
Using morphossyntactic information in TTS systems: comparing strategies for European Portuguese
PROPOR'03 Proceedings of the 6th international conference on Computational processing of the Portuguese language
On the dynamic adaptation of language models based on dialogue information
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts available daily on the Web is proposed. It uses morpho-syntactic knowledge (part-of-speech information) about an in-domain training corpus for daily selection of an optimal vocabulary. Using an information retrieval engine and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based language model. When applied to a daily and live closed-captioning system of live TV broadcasts, it was shown to be effective, with a relative reduction of out-of-vocabulary word rate (69%) and WER (12.0%) when compared to the results obtained by the baseline system with the same vocabulary size.