A Graphical Model for Audiovisual Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.01 |
A human being understands the environment by integrating information obtained by the senses of sight, hearing and touch.To integrate information across different senses, a human being must find the correspondence of events observed by different senses. We obtain image, sound signals by the senses of sight and hearing from the external world as afferent signals, and the copy of efferent signals (a command to the motor system), which are the information of the internal world. In this paper, we propose a method for relating multiple audio-visual events to an efferent signal (motor command to hand) according to general laws without object-specific knowledge. As corresponding cues, we use Gestalt’s grouping law; simultaneity of sound onsets and changes in movement, similarity of repetition between sound and movement. We conducted experiments in the real environment and obtained satisfactory results showing the effectiveness of the proposed method.