Speaker-clustered acoustic models evaluated on GPU for on-line subtitling of parliament meetings

Authors:
Josef V. Psutka;Jan Vaněk;Josef Psutka
Affiliations:
Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic;Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic;Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic
Venue:
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Year:
2011

Citing 5
Cited 0

Recording and Annotation of the Czech Speech Corpus

TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
The Czech Broadcast Conversation Corpus

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Discriminative Training of Gender-Dependent Acoustic Models

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Gender-dependent acoustic models fusion developed for automatic subtitling of parliament meetings broadcasted by the Czech TV

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Automatic online subtitling of the czech parliament meetings

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the effort with building speaker-clustered acoustic models as a part of the real-time LVCSR system that is used more than one year by the Czech TV for automatic subtitling of parliament meetings broadcasted on the channel ČT24. Speaker-clustered acoustic models are more acoustically homogeneous and therefore give better recognition performance than single gender-independent model or even gender-dependent models. Frequent changes of speakers and a direct connection of the LVCSR system to the audio channel require an automatic switching/fusion of models as quickly as possible. An important part of the solution is real time likelihood evaluations of all clustered acoustic models, taking advantage of a fast GPU(Graphic Processing Unit). The proposed method achieved a WER reduction to the baseline gender-independent model over 2.34% relatively with more than 2M Gaussian mixtures evaluated in real-time.