Exploiting auditory fovea in humanoid-human interaction

Authors:
Kazuhiro Nakadai;Hiroshi G. Okuno;Hiroaki Kitano
Affiliations:
Kitano Symbiotic Systems Project, ERATO, Japan Science and Technology Corp., Mansion 31 Suite 6A, 6-31-15 Jingumae, Shibuya-ku, Tokyo 150-0001, Japan;Kitano Symbiotic Systems Project, ERATO, Japan Science and Technology Corp., Mansion 31 Suite 6A, 6-31-15 Jingumae, Shibuya-ku, Tokyo 150-0001, Japan and Department of Intelligence Science and Tec ...;Kitano Symbiotic Systems Project, ERATO, Japan Science and Technology Corp., Mansion 31 Suite 6A, 6-31-15 Jingumae, Shibuya-ku, Tokyo 150-0001, Japan and Sony Computer Science Laboratories, Inc.
Venue:
Eighteenth national conference on Artificial intelligence
Year:
2002

Citing 4
Cited 4

Three-dimensional computer vision: a geometric viewpoint

Three-dimensional computer vision: a geometric viewpoint
Using vision to improve sound source separation

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Active Audition for Humanoid

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Real-time auditory and visual multiple-object tracking for humanoids

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Realizing Audio-Visually Triggered ELIZA-Like Non-verbal Behaviors

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Design and implementation of personality of humanoids in human humanoid non-verbal interaction

IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
Active audition using the parameter-less self-organising map

Autonomous Robots
Hierarchical neuro-fuzzy systems

IWANN'03 Proceedings of the Artificial and natural neural networks 7th international conference on Computational methods in neural modeling - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

A robot's auditory perception of the real world should be able to cope with motor and other noises caused by the robot's own movements in addition to environment noises and reverberation. This paper presents the active direction-pass filter (ADPF) that separates sounds originating from a specified direction detected by a pair of microphones. Thus the ADPF is based on directional processing - a process used in visual processing. The ADPF is implemented by hierarchical integration of visual and auditory processing with hypothetical reasoning of interaural phase difference (IPD) and interaural intensity difference (IID) for each sub-band. The ADPF gives differences in resolution in sound localization and separation depending on where the sound comes from: the resolving power is much higher for sounds coming directly from the front of the humanoid than for sounds coming from the periphery. This directional resolving property is similar to that of the eye whereby the visual fovea at the center of the retina is capable of much higher resolution than is the periphery of the retina. To exploit the corresponding "auditory fovea", the ADPF controls the direction of the head. The human tracking and sound source separation based on the ADPF is implemented on the upper-torso of the humanoid and runs in real-time using distributed processing by 5 PCs networked via a gigabit ethernet. The signal-to-noise ratio (SNR) and noise reduction ratio of each sound separated by the ADPF from a mixture of two or three speeches of the same volume were increased by about 2.2 dB and 9 dB, respectively.