Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
Fundamentals of speech recognition
Fundamentals of speech recognition
The limits of speech recognition
Communications of the ACM
Connectionist Speech Recognition: A Hybrid Approach
Connectionist Speech Recognition: A Hybrid Approach
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Hi-index | 0.00 |
Multi-modal interfaces for mobile applications include tiny screens, keyboards, touch screens, ear phones, microphones and software components for voice-based man-machine interaction. The software enabling voice recognition, as well as the microphone, are of primary importance in a noisy environment. Current performances of voice applications are reasonably good in quiet environment. However, the surrounding noise in many practical situations largely deteriorates the quality of the speech signal. As a consequence, the recognition rate decreases significantly. Noise management is a major focus in developing voice-enabled technologies. This project addresses the problem of voice recognition with the goal of reaching a high success rate (ideally above 99%) in an outdoor environment that is noisy and hostile: the user stands on an open deck of a motor-boat and use his/her voice to command applications running on a laptop by using a wireless microphone. In addition to the problem of noise, there are other constraints strongly limiting the hardware options. Furthermore, the user must also perform several tasks simultaneously. The success of the solution must rely on the efficiency and effectiveness of the voice recognition algorithm and the choice of the microphone. In addition, the training of the recognizer should be kept to a minimum and the recognition time should not last longer than 3 seconds. For these two reasons, only a limited set of voice commands have been tested. A first demonstrator based on digit keyword spotting trained over phone speech showed poor performances in very noisy conditions. A second demonstrator combining neural network and template matching techniques lead to nearly acceptable results when the user recorded the keywords. Since the recognition rate was approximated around 90%, no additional field test was undertaken. This R&D project shows that state-of-the-art research on voice recognition needs further investigations in order to recognize spoken keywords in noisy environments. In addition to on-going improvements, unconventional research approaches that are worth testing include, deriving adapted keywords to specialized algorithms and having the user learn these keyword.