Robust speech interaction in motorcycle environment

Authors:
Iosif Mporas;Otilia Kocsis;Todor Ganchev;Nikos Fakotakis
Affiliations:
Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece
Venue:
Expert Systems with Applications: An International Journal
Year:
2010

Citing 8
Cited 3

Instance-Based Learning Algorithms

Machine Learning
Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the projection, for robust speech recognition in cars

Speech Communication - Eurospeech '91
Bagging predictors

Machine Learning
Improvements to Platt's SMO Algorithm for SVM Classifier Design

Neural Computation
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Objective comparison of speech enhancement algorithms under real world conditions

Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments
An expert system for predicting the effects of speech interference due to noise pollution on humans using fuzzy approach

Expert Systems with Applications: An International Journal
Olympus: an open-source framework for conversational spoken language interface research

NAACL-HLT-Dialog '07 Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies

Context-adaptive pre-processing scheme for robust speech recognition in fast-varying noise environment

Signal Processing
Emerging Input Technologies for Always-Available Mobile Interaction

Foundations and Trends in Human-Computer Interaction
Affective speech interface in serious games for supporting therapy of mental disorders

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Aiming at robust spoken dialogue interaction in motorcycle environment, we investigate various configurations for a speech front-end, which consists of speech pre-processing, speech enhancement and speech recognition components. These components are implemented as agents in the Olympus/RavenClaw framework, which is the core of a multimodal dialogue interaction interface of a wearable solution for information support of the motorcycle police force on the move. In the present effort, aiming at optimizing the speech recognition performance, different experimental setups are considered for the speech front-end. The practical value of various speech enhancement techniques is assessed and, after analysis of their performances, a collaborative scheme is proposed. In this collaborative scheme independent speech enhancement channels operate in parallel on a common input and their outputs are fed to the multithread speech recognition component. The outcome of the speech recognition process is post-processed by an appropriate fusion technique, which contributes for a more accurate interpretation of the input. Investigating various fusion algorithms, we identified the Adaboost.M1 algorithm as the one performing best. Utilizing the fusion collaborative scheme based on the Adaboost.M1 algorithm, significant improvement of the overall speech recognition performance was achieved. This is expressed in terms of word recognition rate and correctly recognized words, as accuracy gain of 8.0% and 5.48%, respectively, when compared to the performance of the best speech enhancement channel, alone. The advance offered in the present work reaches beyond the specifics of the present application, and can be beneficial to spoken interfaces operating in non-stationary noise environments.