Being bored? Recognising natural interest by extensive audiovisual integration for real-life application

Authors:
Björn Schuller;Ronald Müller;Florian Eyben;Jürgen Gast;Benedikt Hörnler;Martin Wöllmer;Gerhard Rigoll;Anja Höthker;Hitoshi Konosu
Affiliations:
Institute for Human-Machine Communication, Technische Universität München, D-80333 München, Germany;Altran Technologies, Bernhard-Wicki-Str. 3, 80636 München, Germany;Institute for Human-Machine Communication, Technische Universität München, D-80333 München, Germany;Institute for Human-Machine Communication, Technische Universität München, D-80333 München, Germany;Institute for Human-Machine Communication, Technische Universität München, D-80333 München, Germany;Institute for Human-Machine Communication, Technische Universität München, D-80333 München, Germany;Institute for Human-Machine Communication, Technische Universität München, D-80333 München, Germany;Toyota Motor Europe, Production Engineering - Advanced Technologies, B-1930 Zaventem, Belgium;Toyota Motor Corporation, 1 Toyota-cho, Toyota City, Aichi 471-8571, Japan
Venue:
Image and Vision Computing
Year:
2009

Citing 24
Cited 25

Automatic recognition and analysis of human faces and facial expressions: a survey

Pattern Recognition
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
The FERET Evaluation Methodology for Face-Recognition Algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Usability Engineering

Usability Engineering
Prosody in Speech Understanding Systems

Prosody in Speech Understanding Systems
Modeling Multimodal Expression of User's Affective Subjective Experience

User Modeling and User-Adapted Interaction
Probabilistic Combination of Multiple Modalities to Detect Interest

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 10 - Volume 10
E-motional advantage: performance and satisfaction gains with affective computing

CHI '05 Extended Abstracts on Human Factors in Computing Systems
Towards emotion prediction in spoken tutoring dialogues

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Hidden Markov model-based speech emotion recognition

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Toward multimodal fusion of affective cues

Proceedings of the 1st ACM international workshop on Human-centered multimedia
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
The painful face: pain expression recognition using active appearance models

Proceedings of the 9th international conference on Multimodal interfaces
Audiovisual recognition of spontaneous interest within conversations

Proceedings of the 9th international conference on Multimodal interfaces
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence
On the use of nonverbal speech sounds in human communication

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
Foundations of human computing: facial expression and emotion

ICMI'06/IJCAI'07 Proceedings of the ICMI 2006 and IJCAI 2007 international conference on Artifical intelligence for human computing
Gaze-X: adaptive, affective, multimodal interface for single-user office scenarios

ICMI'06/IJCAI'07 Proceedings of the ICMI 2006 and IJCAI 2007 international conference on Artifical intelligence for human computing
An automated face reader for fatigue detection

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
RealTourist: a study of augmenting human-human and human-computer dialogue with eye-gaze overlay

INTERACT'05 Proceedings of the 2005 IFIP TC13 international conference on Human-Computer Interaction
Multimodal integration-a statistical view

IEEE Transactions on Multimedia
Adaptive active appearance models

IEEE Transactions on Image Processing
Modeling focus of attention for meeting indexing based on multiple cues

IEEE Transactions on Neural Networks

Editorial: Visual and multimodal analysis of human spontaneous behaviour: Introduction to the Special Issue

Image and Vision Computing
A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams

Neurocomputing
Implicit human-centered tagging

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach

Advances in Human-Computer Interaction - Special issue on emotion-aware natural interaction
Determination of nonprototypical valence and arousal in popular music: features and performances

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on scalable audio-content analysis
Emotion on the road: necessity, acceptance, and feasibility of affective computing in the car

Advances in Human-Computer Interaction - Special issue on emotion-aware natural interaction
Affective speaker state analysis in the presence of reverberation

International Journal of Speech Technology
Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario

ACM Transactions on Speech and Language Processing (TSLP)
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Speech Communication
Emotional states in judicial courtrooms: An experimental investigation

Speech Communication
A multitask approach to continuous five-dimensional affect sensing in natural speech

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Affective Interaction in Natural Environments
Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

International Journal of Speech Technology
Speech emotion recognition: Features and classification models

Digital Signal Processing
Paralinguistics in speech and language-State-of-the-art and the challenge

Computer Speech and Language
Intrinsic and extrinsic evaluation of an automatic user disengagement detector for an uncertainty-adaptive spoken dialogue system

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Improving generalisation and robustness of acoustic affect recognition

Proceedings of the 14th ACM international conference on Multimodal interaction
Ten recent trends in computational paralinguistics

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework

Image and Vision Computing
Categorical and dimensional affect analysis in continuous input: Current trends and future directions

Image and Vision Computing
The MAHNOB Laughter database

Image and Vision Computing
Towards the automatic detection of spontaneous agreement and disagreement based on nonverbal behaviour: A survey of related cues, databases, and tools

Image and Vision Computing
Automatic classification of eye activity for cognitive load measurement with emotion interference

Computer Methods and Programs in Biomedicine
Speech emotional features extraction based on electroglottograph

Neural Computation
Level of interest sensing in spoken dialog using decision-level fusion of acoustic and lexical evidence

Computer Speech and Language
Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic detection of the level of human interest is of high relevance for many technical applications, such as automatic customer care or tutoring systems. However, the recognition of spontaneous interest in natural conversations independently of the subject remains a challenge. Identification of human affective states relying on single modalities only is often impossible, even for humans, since different modalities contain partially disjunctive cues. Multimodal approaches to human affect recognition generally are shown to boost recognition performance, yet are evaluated in restrictive laboratory settings only. Herein we introduce a fully automatic processing combination of Active-Appearance-Model-based facial expression, vision-based eye-activity estimation, acoustic features, linguistic analysis, non-linguistic vocalisations, and temporal context information in an early feature fusion process. We provide detailed subject-independent results for classification and regression of the Level of Interest using Support-Vector Machines on an audiovisual interest corpus (AVIC) consisting of spontaneous, conversational speech demonstrating ''theoretical'' effectiveness of the approach. Further, to evaluate the approach with regards to real-life usability a user-study is conducted for proof of ''practical'' effectiveness.