Intelligent Multi-modal Interfaces for Mobile Applications in Hostile Environment(IM-HOST)

  • Authors:
  • Claude Stricker;Jean-Frédéric Wagen;Guillermo Aradilla;Hervé Bourlard;Hynek Hermansky;Joel Pinto;Paul-Henri Rey;Jérôme Théraulaz

  • Affiliations:
  • AISTS, Lausanne, CH-1015 and HES-SO Valais, Sierre, CH-3960;EIA-FR (HES-SO Fribourg), Fribourg, CH-1705;Idiap Research Institute, Martigny, CH-1920;Idiap Research Institute, Martigny, CH-1920;Idiap Research Institute, Martigny, CH-1920;Idiap Research Institute, Martigny, CH-1920;AISTS, Lausanne, CH-1015 and HES-SO Valais, Sierre, CH-3960;EIA-FR (HES-SO Fribourg), Fribourg, CH-1705

  • Venue:
  • Human Machine Interaction
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multi-modal interfaces for mobile applications include tiny screens, keyboards, touch screens, ear phones, microphones and software components for voice-based man-machine interaction. The software enabling voice recognition, as well as the microphone, are of primary importance in a noisy environment. Current performances of voice applications are reasonably good in quiet environment. However, the surrounding noise in many practical situations largely deteriorates the quality of the speech signal. As a consequence, the recognition rate decreases significantly. Noise management is a major focus in developing voice-enabled technologies. This project addresses the problem of voice recognition with the goal of reaching a high success rate (ideally above 99%) in an outdoor environment that is noisy and hostile: the user stands on an open deck of a motor-boat and use his/her voice to command applications running on a laptop by using a wireless microphone. In addition to the problem of noise, there are other constraints strongly limiting the hardware options. Furthermore, the user must also perform several tasks simultaneously. The success of the solution must rely on the efficiency and effectiveness of the voice recognition algorithm and the choice of the microphone. In addition, the training of the recognizer should be kept to a minimum and the recognition time should not last longer than 3 seconds. For these two reasons, only a limited set of voice commands have been tested. A first demonstrator based on digit keyword spotting trained over phone speech showed poor performances in very noisy conditions. A second demonstrator combining neural network and template matching techniques lead to nearly acceptable results when the user recorded the keywords. Since the recognition rate was approximated around 90%, no additional field test was undertaken. This R&D project shows that state-of-the-art research on voice recognition needs further investigations in order to recognize spoken keywords in noisy environments. In addition to on-going improvements, unconventional research approaches that are worth testing include, deriving adapted keywords to specialized algorithms and having the user learn these keyword.