Robust Recognition of Simultaneous Speech by a Mobile Robot

Authors:
J. -M. Valin;S. Yamamoto;J. Rouat;F. Michaud;K. Nakadai;H. G. Okuno
Affiliations:
Commonwealth Sci. & Ind. Res. Organ. Inf. & Commun. Technol. (CSIROICT) Centre, Sydney;-;-;-;-;-
Venue:
IEEE Transactions on Robotics
Year:
2007

Citing 0
Cited 6

Speaker localization and speech extraction with the EAR sensor

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Study on navigation system of mobile robot based on auditory localization

ROBIO'09 Proceedings of the 2009 international conference on Robotics and biomimetics
Phoneme and tonal accent recognition for Thai speech

Expert Systems with Applications: An International Journal
The ManyEars open framework

Autonomous Robots
ROS open-source audio recognizer: ROAR environmental sound detection tools for robot programming

Autonomous Robots
Binaural active audition for humanoid robots to localise speech over entire azimuth range

Applied Bionics and Biomechanics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a system that gives a mobile robot the ability to perform automatic speech recognition with simultaneous speakers. A microphone array is used along with a real-time implementation of geometric source separation (GSS) and a postfilter that gives a further reduction of interference from other sources. The postfllter is also used to estimate the reliability of spectral features and compute a missing feature mask. The mask is used in a missing feature theory-based speech recognition system to recognize the speech from simultaneous Japanese speakers in the context of a humanoid robot. Recognition rates are presented for three simultaneous speakers located at 2 m from the robot. The system was evaluated on a 200-word vocabulary at different azimuths between sources, ranging from 10deg to 90deg. Compared to the use of the microphone array source separation alone, we demonstrate an average reduction in relative recognition error rate of 24% with the postfllter and of 42% when the missing features approach is combined with the postfllter. We demonstrate the effectiveness of our multisource microphone array postfilter and the improvement it provides when used in conjunction with the missing features theory.