Toward multimodal fusion of affective cues

Authors:
Marco Paleari;Christine L. Lisetti
Affiliations:
Eurecom Institute, Sophia Antipolis, France;Eurecom Institute, Sophia Antipolis, France
Venue:
Proceedings of the 1st ACM international workshop on Human-centered multimedia
Year:
2006

Citing 15
Cited 9

Fundamentals of speech recognition

Fundamentals of speech recognition
Affective computing

Affective computing
Automatic Analysis of Facial Expressions: The State of the Art

IEEE Transactions on Pattern Analysis and Machine Intelligence
Emotions and personality in agent design

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1
Embodied contextual agent in information delivering application

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
MAUI: a multimodal affective user interface

Proceedings of the tenth ACM international conference on Multimedia
Vision-Based Gesture Recognition: A Review

GW '99 Proceedings of the International Gesture Workshop on Gesture-Based Communication in Human-Computer Interaction
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Emotion Recognition Using a Cauchy Naive Bayes Classifier

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 1 - Volume 1
Toward Multimodal Interpretation in a Natural Speech/Gesture Interface

ICIIS '99 Proceedings of the 1999 International Conference on Information Intelligence and Systems
Analysis of emotion recognition using facial expressions, speech and multimodal information

Proceedings of the 6th international conference on Multimodal interfaces
Online face detection and user authentication

Proceedings of the 13th annual ACM international conference on Multimedia
Product HMMs for audio-visual continuous speech recognition using facial animation parameters

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Using noninvasive wearable computers to recognize human emotions from physiological signals

EURASIP Journal on Applied Signal Processing
Active affective State detection and user assistance with dynamic bayesian networks

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

ALICIA

IVA '07 Proceedings of the 7th international conference on Intelligent Virtual Agents
Being bored? Recognising natural interest by extensive audiovisual integration for real-life application

Image and Vision Computing
Automatic temporal segment detection and affect recognition from face and body display

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Multimodal information fusion application to human emotion recognition from face and speech

Multimedia Tools and Applications
Emotion recognition using bimodal data fusion

Proceedings of the 12th International Conference on Computer Systems and Technologies
Fusion of audio- and visual cues for real-life emotional human robot interaction

DAGM'11 Proceedings of the 33rd international conference on Pattern recognition
Hybrid fusion approach for detecting affects from multichannel physiology

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Multimodal affect recognition in intelligent tutoring systems

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
3D Virtual worlds and the metaverse: Current status and future possibilities

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

During face to face communication, it has been suggested that as much as 70% of what people communicate when talking directly with others is through paralanguage involving multiple modalities combined together (e.g. voice tone and volume, body language). In an attempt to render humancomputer interaction more similar to human-human communication and enhance its naturalness, research on sensory acquisition and interpretation of single modalities of human expressions have seen ongoing progress over the last decade. These progresses are rendering current research on artificial sensor fusion of multiple modalities an increasingly important research domain in order to reach better accuracy of congruent messages on the one hand, and possibly to be able to detect incongruent messages across multiple modalities (incongruency being itself a message about the nature of the information being conveyed). Accurate interpretation of emotional signals - quintessentially multimodal - would hence particularly benefit from multimodal sensor fusion and interpretation algorithms. In this paper we provide a state of the art multimodal fusion and describe one way to implement a generic framework for multimodal emotion recognition. The system is developed within the MAUI framework [31] and Scherer's Component Process Theory (CPT) [49, 50, 51, 24, 52], with the goal to be modular and adaptive. We want the designed framework to be able to accept different single and multi modality recognition systems and to automatically adapt the fusion algorithm to find optimal solutions. The system also aims to be adaptive to channel (and system) reliability.