Recognition of simultaneous speech by estimating reliability of separated signals for robot audition

Authors:
Shun'ichi Yamamoto;Ryu Takeda;Kazuhiro Nakadai;Mikio Nakano;Hiroshi Tsujino;Jean-Marc Valin;Kazunori Komatani;Tetsuya Ogata;Hiroshi G. Okuno
Affiliations:
Graduate School of Informatics, Kyoto University, Japan;Graduate School of Informatics, Kyoto University, Japan;Honda Research Institute Japan Co., Ltd., Japan;Honda Research Institute Japan Co., Ltd., Japan;Honda Research Institute Japan Co., Ltd., Japan;CSIRO ICT Centre, Australia;Graduate School of Informatics, Kyoto University, Japan;Graduate School of Informatics, Kyoto University, Japan;Graduate School of Informatics, Kyoto University, Japan
Venue:
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Year:
2006

Citing 6
Cited 2

Adaptive filter theory

Adaptive filter theory
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Understanding three simultaneous speeches

IJCAI'97 Proceedings of the 15th international joint conference on Artifical intelligence - Volume 1
Real-time auditory and visual multiple-object tracking for humanoids

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Interfacing sound stream segregation to automatic speech recognition: preliminary results on listening to several sounds simultaneously

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals

IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems

The design of phoneme grouping for coarse phoneme recognition

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
The ManyEars open framework

Autonomous Robots

Quantified Score

Hi-index	0.00

Visualization

Abstract

"Listening to several things at once" is a people's dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively.