Real-time sound source orientation estimation using a 96 channel microphone array

  • Authors:
  • Hirofumi Nakajima;Keiko Kikuchi;Toru Daigo;Yutaka Kaneda;Kazuhiro Nakadai;Yuji Hasegawa

  • Affiliations:
  • Honda Research Institute Japan Co., Ltd, Wako-shi, Saitama, Japan;Tokyo Denki University, Tokyo, Japan;Tokyo Denki University, Tokyo, Japan;Tokyo Denki University, Tokyo, Japan;Honda Research Institute Japan Co., Ltd, Wako-shi, Saitama, Japan;Honda Research Institute Japan Co., Ltd, Wako-shi, Saitama, Japan

  • Venue:
  • IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes real-time sound source orientation estimation based on orientation-extended amplitude beamforming (OE-ABF). To recognize a sound source orientation (such as face orientation) is an important function for a robot who can achieve natural human-robot interaction because the function is required to distinguish the human target from a robot or another person. We developed a sound source orientation system using orientation-extended beamforming (OE-BF) and showed the system worked properly at least under a specific controlled environment. However, in practical use, this system does not work properly because the system doesn't take into account the differences between the supposed model in OEBF and in practical situations. For example, the system model supposes that there is neither noise nor reverberation, however, this is not a realistic assumption. To solve this assumption mismatch problem, we propose sound source orientation estimation based on OE-ABF, and constructed a real-time sound source orientation estimation system with the proposed method using a 96ch microphone array. Evaluation results of our proposed system show that the average error of estimated angles is lower than 5°, while the error of our previously reported system was greater than 20°. With this system, the robot is able to distinguish that the utterance target of a person standing 1m in front is itself or another person standing 0.2m to the left of the robot. This is valuable for human-robot interaction.