The CAVA corpus: synchronised stereoscopic and binaural datasets with head movements

Authors:
Elise Arnaud;Heidi Christensen;Yan-Chen Lu;Jon Barker;Vasil Khalidov;Miles Hansard;Bertrand Holveck;Hervé Mathieu;Ramya Narasimha;Elise Taillant;Florence Forbes;Radu Horaud
Affiliations:
Université Joseph Fourier and INRIA Rhône-Alpes, Grenoble, France;University of Sheffield, Sheffield, United Kingdom;University of Sheffield, Sheffield, United Kingdom;University of Sheffield, Sheffield, United Kingdom;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France
Venue:
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Year:
2008

Citing 5
Cited 4

Multiple view geometry in computer visiond

Multiple view geometry in computer visiond
A segment-based audio-visual speech recognizer: data collection, development, and initial experiments

Proceedings of the 6th international conference on Multimodal interfaces
Detection and localization of 3d audio-visual objects using unsupervised clustering

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
The BANCA database and evaluation protocol

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
AV16.3: an audio-visual corpus for speaker localization and tracking

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction

Detection and localization of 3d audio-visual objects using unsupervised clustering

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Conjugate mixture models for clustering multimodal data

Neural Computation
Finding audio-visual events in informal social gatherings

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
The vernissage corpus: a conversational human-robot-interaction dataset

Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the acquisition and content of a new multi-modal database. Some tools for making use of the data streams are also presented. The Computational Audio-Visual Analysis (CAVA) database is a unique collection of three synchronised data streams obtained from a binaural microphone pair, a stereoscopic camera pair and a head tracking device. All recordings are made from the perspective of a person; i.e. what would a human with natural head movements see and hear in a given environment. The database is intended to facilitate research into humans' ability to optimise their multi-modal sensory input and fills a gap by providing data that enables human centred audio-visual scene analysis. It also enables 3D localisation using either audio, visual, or audio-visual cues. A total of 50 sessions, with varying degrees of visual and auditory complexity, were recorded. These range from seeing and hearing a single speaker moving in and out of field of view, to moving around a 'cocktail party' style situation, mingling and joining different small groups of people chatting.