Intelligent Multi-modal Interfaces for Mobile Applications in Hostile Environment(IM-HOST)

Authors:
Claude Stricker;Jean-Frédéric Wagen;Guillermo Aradilla;Hervé Bourlard;Hynek Hermansky;Joel Pinto;Paul-Henri Rey;Jérôme Théraulaz
Affiliations:
AISTS, Lausanne, CH-1015 and HES-SO Valais, Sierre, CH-3960;EIA-FR (HES-SO Fribourg), Fribourg, CH-1705;Idiap Research Institute, Martigny, CH-1920;Idiap Research Institute, Martigny, CH-1920;Idiap Research Institute, Martigny, CH-1920;Idiap Research Institute, Martigny, CH-1920;AISTS, Lausanne, CH-1015 and HES-SO Valais, Sierre, CH-3960;EIA-FR (HES-SO Fribourg), Fribourg, CH-1705
Venue:
Human Machine Interaction
Year:
2009

Citing 7
Cited 0

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Fundamentals of speech recognition

Fundamentals of speech recognition
The limits of speech recognition

Communications of the ACM
Connectionist Speech Recognition: A Hybrid Approach

Connectionist Speech Recognition: A Hybrid Approach
Hybrid HMM/ANN Systems for Training Independent Tasks: Experiments on Phonebook and Related Improvements

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
The role of speech in multimodal human-computer interaction: towards reliable rejection of non-keyword input

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-modal interfaces for mobile applications include tiny screens, keyboards, touch screens, ear phones, microphones and software components for voice-based man-machine interaction. The software enabling voice recognition, as well as the microphone, are of primary importance in a noisy environment. Current performances of voice applications are reasonably good in quiet environment. However, the surrounding noise in many practical situations largely deteriorates the quality of the speech signal. As a consequence, the recognition rate decreases significantly. Noise management is a major focus in developing voice-enabled technologies. This project addresses the problem of voice recognition with the goal of reaching a high success rate (ideally above 99%) in an outdoor environment that is noisy and hostile: the user stands on an open deck of a motor-boat and use his/her voice to command applications running on a laptop by using a wireless microphone. In addition to the problem of noise, there are other constraints strongly limiting the hardware options. Furthermore, the user must also perform several tasks simultaneously. The success of the solution must rely on the efficiency and effectiveness of the voice recognition algorithm and the choice of the microphone. In addition, the training of the recognizer should be kept to a minimum and the recognition time should not last longer than 3 seconds. For these two reasons, only a limited set of voice commands have been tested. A first demonstrator based on digit keyword spotting trained over phone speech showed poor performances in very noisy conditions. A second demonstrator combining neural network and template matching techniques lead to nearly acceptable results when the user recorded the keywords. Since the recognition rate was approximated around 90%, no additional field test was undertaken. This R&D project shows that state-of-the-art research on voice recognition needs further investigations in order to recognize spoken keywords in noisy environments. In addition to on-going improvements, unconventional research approaches that are worth testing include, deriving adapted keywords to specialized algorithms and having the user learn these keyword.