Audiovisual recognition of spontaneous interest within conversations

  • Authors:
  • Bjöern Schuller;Ronald Müeller;Benedikt Höernler;Anja Höethker;Hitoshi Konosu;Gerhard Rigoll

  • Affiliations:
  • Technische Universitaet Muenchen, Muenchen, Germany;Technische Universitaet Muenchen, Muenchen, Germany;Technische Universitaet Muenchen, Muenchen, Germany;Toyota Motor Europe, Zaventem, Belgium;Toyota Motor Corporation, Toyota City, Japan;Technische Universitaet Muenchen, Muenchen, Germany

  • Venue:
  • Proceedings of the 9th international conference on Multimodal interfaces
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work we present an audiovisual approach to the recognition of spontaneous interest in human conversations. For a most robust estimate, information from four sources is combined by a synergistic and individual failure tolerant fusion. Firstly, speech is analyzed with respect to acoustic properties based on a high-dimensional prosodic, articulatory, and voice quality feature space plus the linguistic analysis of spoken content by LVCSR and bag-of-words vector space modeling including non-verbals. Secondly, visual analysis provides patterns of the facial expression by AAMs, and of the movement activity by eye tracking. Experiments base on a database of 10.5h of spontaneous human-to-human conversation containing 20 subjects in gender and age-class balance. Recordings are fulfilled with a room microphone, camera, and headsets for close-talk to consider diverse comfort and noise conditions. Three levels of interest were annotated within a rich transcription. We describe each information stream and a fusion on an early level in detail. Our experiments aim at a person-independent system for real-life usage and show the high potential of such a multimodal approach. Benchmark results based on transcription versus automatic processing are also provided.