Robust speech interaction in motorcycle environment

  • Authors:
  • Iosif Mporas;Otilia Kocsis;Todor Ganchev;Nikos Fakotakis

  • Affiliations:
  • Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2010

Quantified Score

Hi-index 12.05

Visualization

Abstract

Aiming at robust spoken dialogue interaction in motorcycle environment, we investigate various configurations for a speech front-end, which consists of speech pre-processing, speech enhancement and speech recognition components. These components are implemented as agents in the Olympus/RavenClaw framework, which is the core of a multimodal dialogue interaction interface of a wearable solution for information support of the motorcycle police force on the move. In the present effort, aiming at optimizing the speech recognition performance, different experimental setups are considered for the speech front-end. The practical value of various speech enhancement techniques is assessed and, after analysis of their performances, a collaborative scheme is proposed. In this collaborative scheme independent speech enhancement channels operate in parallel on a common input and their outputs are fed to the multithread speech recognition component. The outcome of the speech recognition process is post-processed by an appropriate fusion technique, which contributes for a more accurate interpretation of the input. Investigating various fusion algorithms, we identified the Adaboost.M1 algorithm as the one performing best. Utilizing the fusion collaborative scheme based on the Adaboost.M1 algorithm, significant improvement of the overall speech recognition performance was achieved. This is expressed in terms of word recognition rate and correctly recognized words, as accuracy gain of 8.0% and 5.48%, respectively, when compared to the performance of the best speech enhancement channel, alone. The advance offered in the present work reaches beyond the specifics of the present application, and can be beneficial to spoken interfaces operating in non-stationary noise environments.