Speaker-clustered acoustic models evaluated on GPU for on-line subtitling of parliament meetings

  • Authors:
  • Josef V. Psutka;Jan Vaněk;Josef Psutka

  • Affiliations:
  • Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic;Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic;Department of Cybernetics, West Bohemia University, Pilsen, Czech Republic

  • Venue:
  • TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the effort with building speaker-clustered acoustic models as a part of the real-time LVCSR system that is used more than one year by the Czech TV for automatic subtitling of parliament meetings broadcasted on the channel ČT24. Speaker-clustered acoustic models are more acoustically homogeneous and therefore give better recognition performance than single gender-independent model or even gender-dependent models. Frequent changes of speakers and a direct connection of the LVCSR system to the audio channel require an automatic switching/fusion of models as quickly as possible. An important part of the solution is real time likelihood evaluations of all clustered acoustic models, taking advantage of a fast GPU(Graphic Processing Unit). The proposed method achieved a WER reduction to the baseline gender-independent model over 2.34% relatively with more than 2M Gaussian mixtures evaluated in real-time.