Three-dimensional computer vision: a geometric viewpoint
Three-dimensional computer vision: a geometric viewpoint
Using vision to improve sound source separation
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Real-time auditory and visual multiple-object tracking for humanoids
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Realizing Audio-Visually Triggered ELIZA-Like Non-verbal Behaviors
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Design and implementation of personality of humanoids in human humanoid non-verbal interaction
IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
Active audition using the parameter-less self-organising map
Autonomous Robots
Hierarchical neuro-fuzzy systems
IWANN'03 Proceedings of the Artificial and natural neural networks 7th international conference on Computational methods in neural modeling - Volume 1
Hi-index | 0.00 |
A robot's auditory perception of the real world should be able to cope with motor and other noises caused by the robot's own movements in addition to environment noises and reverberation. This paper presents the active direction-pass filter (ADPF) that separates sounds originating from a specified direction detected by a pair of microphones. Thus the ADPF is based on directional processing - a process used in visual processing. The ADPF is implemented by hierarchical integration of visual and auditory processing with hypothetical reasoning of interaural phase difference (IPD) and interaural intensity difference (IID) for each sub-band. The ADPF gives differences in resolution in sound localization and separation depending on where the sound comes from: the resolving power is much higher for sounds coming directly from the front of the humanoid than for sounds coming from the periphery. This directional resolving property is similar to that of the eye whereby the visual fovea at the center of the retina is capable of much higher resolution than is the periphery of the retina. To exploit the corresponding "auditory fovea", the ADPF controls the direction of the head. The human tracking and sound source separation based on the ADPF is implemented on the upper-torso of the humanoid and runs in real-time using distributed processing by 5 PCs networked via a gigabit ethernet. The signal-to-noise ratio (SNR) and noise reduction ratio of each sound separated by the ADPF from a mixture of two or three speeches of the same volume were increased by about 2.2 dB and 9 dB, respectively.