Are you looking at me, are you talking with me: multimodal classification of the focus of attention

  • Authors:
  • Christian Hacker;Anton Batliner;Elmar Nöth

  • Affiliations:
  • Chair for Pattern Recognition (Informatik 5), University of Erlangen-Nuremberg, Erlangen, Germany;Chair for Pattern Recognition (Informatik 5), University of Erlangen-Nuremberg, Erlangen, Germany;Chair for Pattern Recognition (Informatik 5), University of Erlangen-Nuremberg, Erlangen, Germany

  • Venue:
  • TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic dialogue systems get easily confused if speech is recognized which is not directed to the system Besides noise or other people's conversation, even the user's utterance can cause difficulties when he is talking to someone else or to himself (“Off-Talk”) In this paper the automatic classification of the user's focus of attention is investigated In the German SmartWeb project, a mobile device is used to get access to the semantic web In this scenario, two modalities are provided – speech and video signal This makes it possible to classify whether a spoken request is addressed to the system or not: with the camera of the mobile device, the user's gaze direction is detected; in the speech signal, prosodic features are analyzed Encouraging recognition rates of up to 93 % are achieved in the speech-only condition Further improvement is expected from the fusion of the two information sources.