Are you looking at me, are you talking with me: multimodal classification of the focus of attention

Authors:
Christian Hacker;Anton Batliner;Elmar Nöth
Affiliations:
Chair for Pattern Recognition (Informatik 5), University of Erlangen-Nuremberg, Erlangen, Germany;Chair for Pattern Recognition (Informatik 5), University of Erlangen-Nuremberg, Erlangen, Germany;Chair for Pattern Recognition (Informatik 5), University of Erlangen-Nuremberg, Erlangen, Germany
Venue:
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Year:
2006

Citing 6
Cited 4

Prosodic Classification of Offtalk: First Experiments

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
How to find trouble in communication

Speech Communication - Special issue on speech and emotion
Robust Real-Time Face Detection

International Journal of Computer Vision
Identifying the addressee in human-human-robot interactions based on head pose and speech

Proceedings of the 6th international conference on Multimodal interfaces
A look under the hood: design and development of the first SmartWeb system demonstrator

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
A comparative evaluation of template and histogram based 2d tracking algorithms

PR'05 Proceedings of the 27th DAGM conference on Pattern Recognition

PEAKS - A system for the automatic evaluation of voice and speech disorders

Speech Communication
Embedded surface classification in digital sports

Pattern Recognition Letters
The SmartWeb corpora: multimodal access to the web in natural environments

Multimodal corpora
SmartWeb handheld: multimodal interaction with ontological knowledge bases and semantic web services

ICMI'06/IJCAI'07 Proceedings of the ICMI 2006 and IJCAI 2007 international conference on Artifical intelligence for human computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic dialogue systems get easily confused if speech is recognized which is not directed to the system Besides noise or other people's conversation, even the user's utterance can cause difficulties when he is talking to someone else or to himself (“Off-Talk”) In this paper the automatic classification of the user's focus of attention is investigated In the German SmartWeb project, a mobile device is used to get access to the semantic web In this scenario, two modalities are provided – speech and video signal This makes it possible to classify whether a spoken request is addressed to the system or not: with the camera of the mobile device, the user's gaze direction is detected; in the speech signal, prosodic features are analyzed Encouraging recognition rates of up to 93 % are achieved in the speech-only condition Further improvement is expected from the fusion of the two information sources.