Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Authors:
Björn Schuller;Anton Batliner;Stefan Steidl;Dino Seppi
Affiliations:
Institute for Human-Machine Communication, Technische Universität München, Germany;Pattern Recognition Lab, University of Erlangen-Nuremberg, Germany;Pattern Recognition Lab, University of Erlangen-Nuremberg, Germany;ESAT, Katholieke Universiteit Leuven, Belgium
Venue:
Speech Communication
Year:
2011

Citing 65
Cited 30

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
The affective reasoner: a process model of emotions in a multi-agent system

The affective reasoner: a process model of emotions in a multi-agent system
Original Contribution: Stacked generalization

Neural Networks
Floating search methods in feature selection

Pattern Recognition Letters
Toward Machine Emotional Intelligence: Analysis of Affective Physiological State

IEEE Transactions on Pattern Analysis and Machine Intelligence - Graph Algorithms and Computer Vision
On the use of prosody in automatic dialogue understanding

Speech Communication - Dialogue and prosody
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach

Data Mining and Knowledge Discovery
A model of textual affect sensing using real-world knowledge

Proceedings of the 8th international conference on Intelligent user interfaces
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
How to find trouble in communication

Speech Communication - Special issue on speech and emotion
Modeling drivers' speech under stress

Speech Communication - Special issue on speech and emotion
Emotions, speech and the ASR framework

Speech Communication - Special issue on speech and emotion
Sentiment Analyzer: Extracting Sentiments about a Given Topic using Natural Language Processing Techniques

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Emotion recognition from physiological signals using wireless sensors for presence technologies

Cognition, Technology and Work
A robust on-the-fly pitch (OTFP) estimation algorithm

Proceedings of the 12th annual ACM international conference on Multimedia
Analysis of emotion recognition using facial expressions, speech and multimodal information

Proceedings of the 6th international conference on Multimodal interfaces
Emotive alert: HMM-based emotion detection in voicemail messages

Proceedings of the 10th international conference on Intelligent user interfaces
Relation between PLSA and NMF and implications

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
2005 Special Issue: Beyond emotion archetypes: Databases for emotion modelling using neural networks

Neural Networks - Special issue: Emotion and brain
2005 Special Issue: Challenges in real-life emotion annotation and machine learning based detection

Neural Networks - Special issue: Emotion and brain
ASR for emotional speech: Clarifying the issues and enhancing performance

Neural Networks - Special issue: Emotion and brain
Hidden Markov model-based speech emotion recognition

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Emotion detection in task-oriented spoken dialogues

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
A Method Combining LPC-Based Cepstrum and Harmonic Product Spectrum for Pitch Detection

IIH-MSP '06 Proceedings of the 2006 International Conference on Intelligent Information Hiding and Multimedia
Extracting product features and opinions from reviews

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Ensemble methods for spoken emotion recognition in call-centres

Speech Communication
Integrating audio and visual information to provide highly robust speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Voting ensembles for spoken affect classification

Journal of Network and Computer Applications
Audiovisual recognition of spontaneous interest within conversations

Proceedings of the 9th international conference on Multimodal interfaces
Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech

User Modeling and User-Adapted Interaction
Incremental learning for spoken affect classification and its application in call-centres

International Journal of Intelligent Systems Technologies and Applications
Real-Life Emotion Recognition in Speech

Speaker Classification II
Automatic Classification of Expressiveness in Speech: A Multi-corpus Study

Speaker Classification II
A Systematic Comparison of Different HMM Designs for Emotion Recognition from Acted and Spontaneous Speech

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
On the Necessity and Feasibility of Detecting a Driver's Emotional State While Driving

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
What Should a Generic Emotion Markup Language Be Able to Represent?

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Static and Dynamic Modelling for the Recognition of Non-verbal Vocalisations in Conversational Speech

PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
EmoVoice -- A Framework for Online Recognition of Emotions from Voice

PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
Quantification of Segmentation and F0 Errors and Their Effect on Emotion Recognition

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Social signals, their function, and automatic analysis: a survey

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Human emotion recognition system using optimally designed SVM with different facial feature extraction techniques

WSEAS Transactions on Computers
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection

Expert Systems with Applications: An International Journal
Using WordNet's Semantic Relations for Opinion Detection in Blogs

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Emotion recognition from speech: Putting ASR in the loop

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Contrasting emotion-bearing laughter types in multiparticipant vocal activity detection for meetings

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement

EURASIP Journal on Audio, Speech, and Music Processing
Just how mad are you? finding strong and weak opinion clauses

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Being bored? Recognising natural interest by extensive audiovisual integration for real-life application

Image and Vision Computing
"The Godfather vs. "Chaos: Comparing Linguistic Analysis Based on On-line Knowledge Sources and Bags-of-N-Grams for Movie Review Valence Estimation

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams

Neurocomputing
Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech

Computer Speech and Language
EmoReSp: an online emotion recognizer based on speech

Proceedings of the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing on International Conference on Computer Systems and Technologies
Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach

Advances in Human-Computer Interaction - Special issue on emotion-aware natural interaction
Opensmile: the munich versatile and fast open-source audio feature extractor

Proceedings of the international conference on Multimedia
Audio-visual spontaneous emotion recognition

ICMI'06/IJCAI'07 Proceedings of the ICMI 2006 and IJCAI 2007 international conference on Artifical intelligence for human computing
On the impact of children's emotional speech on acoustic and language models

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Representing real-life emotions in audiovisual data with non basic emotional patterns and context features

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
Multimodal integration-a statistical view

IEEE Transactions on Multimedia
Audio-Visual Affect Recognition

IEEE Transactions on Multimedia
The wavelet transform, time-frequency localization and signal analysis

IEEE Transactions on Information Theory

Affective speaker state analysis in the presence of reverberation

International Journal of Speech Technology
A real-time speech enhancement framework for multi-party meetings

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Improving spontaneous children's emotion recognition by acoustic feature selection and feature-level fusion of acoustic and linguistic parameters

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Application of nonlinear dynamics characterization to emotional speech

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
A multitask approach to continuous five-dimensional affect sensing in natural speech

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Affective Interaction in Natural Environments
Applying multiple classifiers and non-linear dynamics features for detecting sleepiness from speech

Neurocomputing
Affective speech interface in serious games for supporting therapy of mental disorders

Expert Systems with Applications: An International Journal
Emotion recognition in texts for user model augmenting

Proceedings of the 13th International Conference on Interacción Persona-Ordenador
Human modeling in a driver analyzing context: challenge and benefit

Proceedings of the 3rd International Conference on Automotive User Interfaces and Interactive Vehicular Applications
Paralinguistics in speech and language-State-of-the-art and the challenge

Computer Speech and Language
Vocal markers of emotion: Comparing induction and acting elicitation

Computer Speech and Language
Speaker state recognition using an HMM-based feature extraction method

Computer Speech and Language
Multimodal analysis of the implicit affective channel in computer-mediated textual communication

Proceedings of the 14th ACM international conference on Multimodal interaction
Dimensional and continuous analysis of emotions for multimedia applications: a tutorial overview

Proceedings of the 20th ACM international conference on Multimedia
Duration modeling for emotional speech

ICICA'12 Proceedings of the Third international conference on Information Computing and Applications
Ubiquitous emotion-aware computing

Personal and Ubiquitous Computing
Cross-validation of bimodal health-related stress assessment

Personal and Ubiquitous Computing
Ten recent trends in computational paralinguistics

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Conversational speech recognition in non-stationary reverberated environments

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Categorical and dimensional affect analysis in continuous input: Current trends and future directions

Image and Vision Computing
Emotion recognition from speech using global and local prosodic features

International Journal of Speech Technology
Words that Fascinate the Listener: Predicting Affective Ratings of On-Line Lectures

International Journal of Distance Education Technologies
CoWME: a general framework to evaluate cognitive workload during multimodal interaction

Proceedings of the 15th ACM on International conference on multimodal interaction
Shape-based modeling of the fundamental frequency contour for emotion detection in speech

Computer Speech and Language
Hierarchical emotion classification using genetic algorithms

Proceedings of the Fourth Symposium on Information and Communication Technology
Compensating for speaker or lexical variabilities in speech for emotion recognition

Speech Communication
Continuous emotion recognition with phonetic syllables

Speech Communication
Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications

Computer Speech and Language
Class-specific multiple classifiers scheme to recognize emotions from speech signals

Computer Speech and Language
Nonlinear dynamics characterization of emotional speech

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

More than a decade has passed since research on automatic recognition of emotion from speech has become a new field of research in line with its 'big brothers' speech and speaker recognition. This article attempts to provide a short overview on where we are today, how we got there and what this can reveal us on where to go next and how we could arrive there. In a first part, we address the basic phenomenon reflecting the last fifteen years, commenting on databases, modelling and annotation, the unit of analysis and prototypicality. We then shift to automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration. From there we go to the first comparative challenge on emotion recognition from speech - the INTERSPEECH 2009 Emotion Challenge, organised by (part of) the authors, including the description of the Challenge's database, Sub-Challenges, participants and their approaches, the winners, and the fusion of results to the actual learnt lessons before we finally address the ever-lasting problems and future promising attempts.