Recording and Annotation of the Czech Speech Corpus
TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
The Czech Broadcast Conversation Corpus
TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Discriminative Training of Gender-Dependent Acoustic Models
TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Automatic online subtitling of the czech parliament meetings
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Hi-index | 0.00 |
This paper describes the effort with building speaker-clustered acoustic models as a part of the real-time LVCSR system that is used more than one year by the Czech TV for automatic subtitling of parliament meetings broadcasted on the channel ČT24. Speaker-clustered acoustic models are more acoustically homogeneous and therefore give better recognition performance than single gender-independent model or even gender-dependent models. Frequent changes of speakers and a direct connection of the LVCSR system to the audio channel require an automatic switching/fusion of models as quickly as possible. An important part of the solution is real time likelihood evaluations of all clustered acoustic models, taking advantage of a fast GPU(Graphic Processing Unit). The proposed method achieved a WER reduction to the baseline gender-independent model over 2.34% relatively with more than 2M Gaussian mixtures evaluated in real-time.