Prosodic Classification of Offtalk: First Experiments
TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
How to find trouble in communication
Speech Communication - Special issue on speech and emotion
Robust Real-Time Face Detection
International Journal of Computer Vision
Identifying the addressee in human-human-robot interactions based on head pose and speech
Proceedings of the 6th international conference on Multimodal interfaces
A look under the hood: design and development of the first SmartWeb system demonstrator
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
A comparative evaluation of template and histogram based 2d tracking algorithms
PR'05 Proceedings of the 27th DAGM conference on Pattern Recognition
PEAKS - A system for the automatic evaluation of voice and speech disorders
Speech Communication
Embedded surface classification in digital sports
Pattern Recognition Letters
The SmartWeb corpora: multimodal access to the web in natural environments
Multimodal corpora
SmartWeb handheld: multimodal interaction with ontological knowledge bases and semantic web services
ICMI'06/IJCAI'07 Proceedings of the ICMI 2006 and IJCAI 2007 international conference on Artifical intelligence for human computing
Hi-index | 0.00 |
Automatic dialogue systems get easily confused if speech is recognized which is not directed to the system Besides noise or other people's conversation, even the user's utterance can cause difficulties when he is talking to someone else or to himself (“Off-Talk”) In this paper the automatic classification of the user's focus of attention is investigated In the German SmartWeb project, a mobile device is used to get access to the semantic web In this scenario, two modalities are provided – speech and video signal This makes it possible to classify whether a spoken request is addressed to the system or not: with the camera of the mobile device, the user's gaze direction is detected; in the speech signal, prosodic features are analyzed Encouraging recognition rates of up to 93 % are achieved in the speech-only condition Further improvement is expected from the fusion of the two information sources.