The CAVA corpus: synchronised stereoscopic and binaural datasets with head movements

  • Authors:
  • Elise Arnaud;Heidi Christensen;Yan-Chen Lu;Jon Barker;Vasil Khalidov;Miles Hansard;Bertrand Holveck;Hervé Mathieu;Ramya Narasimha;Elise Taillant;Florence Forbes;Radu Horaud

  • Affiliations:
  • Université Joseph Fourier and INRIA Rhône-Alpes, Grenoble, France;University of Sheffield, Sheffield, United Kingdom;University of Sheffield, Sheffield, United Kingdom;University of Sheffield, Sheffield, United Kingdom;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France;INRIA Rhône-Alpes, Grenoble, France

  • Venue:
  • ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the acquisition and content of a new multi-modal database. Some tools for making use of the data streams are also presented. The Computational Audio-Visual Analysis (CAVA) database is a unique collection of three synchronised data streams obtained from a binaural microphone pair, a stereoscopic camera pair and a head tracking device. All recordings are made from the perspective of a person; i.e. what would a human with natural head movements see and hear in a given environment. The database is intended to facilitate research into humans' ability to optimise their multi-modal sensory input and fills a gap by providing data that enables human centred audio-visual scene analysis. It also enables 3D localisation using either audio, visual, or audio-visual cues. A total of 50 sessions, with varying degrees of visual and auditory complexity, were recorded. These range from seeing and hearing a single speaker moving in and out of field of view, to moving around a 'cocktail party' style situation, mingling and joining different small groups of people chatting.