Recognition of simultaneous speech by estimating reliability of separated signals for robot audition

  • Authors:
  • Shun'ichi Yamamoto;Ryu Takeda;Kazuhiro Nakadai;Mikio Nakano;Hiroshi Tsujino;Jean-Marc Valin;Kazunori Komatani;Tetsuya Ogata;Hiroshi G. Okuno

  • Affiliations:
  • Graduate School of Informatics, Kyoto University, Japan;Graduate School of Informatics, Kyoto University, Japan;Honda Research Institute Japan Co., Ltd., Japan;Honda Research Institute Japan Co., Ltd., Japan;Honda Research Institute Japan Co., Ltd., Japan;CSIRO ICT Centre, Australia;Graduate School of Informatics, Kyoto University, Japan;Graduate School of Informatics, Kyoto University, Japan;Graduate School of Informatics, Kyoto University, Japan

  • Venue:
  • PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

"Listening to several things at once" is a people's dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively.