Using vision to improve sound source separation

Authors:
Yukiko Nakagawa;Hiroshi G. Okuno;Hiroaki Kitano
Affiliations:
-;-;-
Venue:
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Year:
1999

Citing 6
Cited 6

Auditory stream segregation in auditory scene analysis with a multi-agent system

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Computational auditory scene analysis

Computational auditory scene analysis
Alternative essences of intelligence

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Harmonic sound stream segregation using localization and its application to speech stream segregation

Speech Communication
Understanding three simultaneous speeches

IJCAI'97 Proceedings of the 15th international joint conference on Artifical intelligence - Volume 1
Interfacing sound stream segregation to automatic speech recognition: preliminary results on listening to several sounds simultaneously

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Designing a humanoid head for RoboCup challenge

AGENTS '00 Proceedings of the fourth international conference on Autonomous agents
Sound and Visual Tracking for Humanoid Robot

Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
Exploiting auditory fovea in humanoid-human interaction

Eighteenth national conference on Artificial intelligence
Sound and Visual Tracking for Humanoid Robot

Applied Intelligence
Real-time auditory and visual multiple-object tracking for humanoids

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Humanoid active audition system improved by the cover acoustics

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method of improving sound source separation using vision. The sound source separation is an essential function to accomplish auditory scene understanding by separating stream of sounds generated from multiple sound sources. By separating a stream of sounds, recognition process, such as speech recognition, can simply work on a single stream, not mixed sound of several speakers. The performance is known to be improved by using stereo/binaural microphone and microphone array which provides spatial information for separation. However, these methods still have more than 20 degree of positional ambiguities. In this paper, we further added visual information to provide more specific and accurate position information. As a result, separation capability was drastically improved. In addition, we found that the use of approximate direction information drastically improve object tracking accuracy of a simple vision system, which in turn improves performance of the auditory system. We claim that the integration of vision and auditory inputs improves performance of tasks in each perception, such as sound source separation and object tracking, by bootstrapping.