Synching models with infants: a perceptual-level model of infant audio-visual synchrony detection

Authors:
Christopher G. Prince;George J. Hollich
Affiliations:
Department of Computer Science, University of Minnesota Duluth, Duluth, MN 55812, USA;Department of Psychological Sciences, Purdue University, West Lafayette, IN 47907, USA
Venue:
Cognitive Systems Research
Year:
2005

Citing 4
Cited 1

Computation and cognition: toward a foundation for cognitive science

Computation and cognition: toward a foundation for cognitive science
Category learning through multimodality sensing

Neural Computation
Multimodal processing by finding common cause

Communications of the ACM - Multimodal interfaces that flex, adapt, and persist
Speaker localisation using audio-visual synchrony: an empirical study

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval

Book review

Connection Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Synchrony detection between different sensory channels appears critically important for learning and cognitive development. In this paper we compare infant studies of audio-visual synchrony detection with a model of synchrony detection based on Gaussian mutual information [Hershey, J., & Movellan, J. (2000). Audio-vision: using audio-visual synchrony to locate sounds. In S. A. Solla, T. K. Leen, & K. R. Muller (Eds.), Advances in neural information processing systems (Vol. 12, pp. 813-819). Cambridge, MA: MIT Press], augmented with methods for quantitative synchrony estimation. Five infant-model comparisons are presented, using stimuli covering a broad range of audio-visual integration types. While infants and the model showed discrimination of each type of stimuli, the model was most successful with stimuli comprised of (a) synchronized punctuate motion and speech, (b) visually balanced left and right instances of the same person talking but speech synchronized with only one side, and (c) two speech audio sources and a dynamic-face motion source. More difficult for the model were stimuli conditions with (d) left and right instances of two different people talking but speech synchronized with only one side, and (e) two speech audio sources and more abstract visual dynamics - an oscilloscope instead of a face. As a first approximation, this model of synchrony detection using low-level sensory features (e.g., RMS audio, grayscale pixels) is a candidate for a mechanism used by infants in detecting audio-visual synchrony.