Emotions, speech and the ASR framework

Authors:
Louis ten Bosch
Affiliations:
A2RT, Department of Language and Speech, University of Nijmegen, P.O. Box 9103, 6500 HD Nijmegen, The Netherlands
Venue:
Speech Communication - Special issue on speech and emotion
Year:
2003

Citing 5
Cited 17

Fundamentals of speech recognition

Fundamentals of speech recognition
State of the art in continuous speech recognition

Voice communication between humans and machines
Affective computing

Affective computing
Computational models of the prosody/syntax mapping for spoken language systems

Computational models of the prosody/syntax mapping for spoken language systems
Predicting automatic speech recognition performance using prosodic cues

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference

Automatic discrimination between laughter and speech

Speech Communication
An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech

Speech Communication
Emotive captioning

Computers in Entertainment (CIE) - Interactive TV
Speaker identification in the shouted environment using Suprasegmental Hidden Markov Models

Signal Processing
Combination of generative models and SVM based classifier for speech emotion recognition

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
The physiological microphone (PMIC): A competitive alternative for speaker assessment in stress detection and speaker verification

Speech Communication
Spoken emotion recognition through optimum-path forest classification using glottal features

Computer Speech and Language
Survey on speech emotion recognition: Features, classification schemes, and databases

Pattern Recognition
Classification of emotion in spoken Finnish using vowel-length segments: Increasing reliability with a fusion technique

Speech Communication
Employing second-order circular suprasegmental hidden Markov models to enhance speaker identification performance in shouted talking environments

EURASIP Journal on Audio, Speech, and Music Processing
Identifying speakers using their emotion cues

International Journal of Speech Technology
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Speech Communication
Audio-based emotion recognition from natural conversations based on co-occurrence matrix and frequency domain energy distribution features

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
The CARES corpus: a database of older adult actor simulated emergency dialogue for developing a personal emergency response system

International Journal of Speech Technology
Toward emotional speaker recognition: framework and preliminary results

CCBR'12 Proceedings of the 7th Chinese conference on Biometric Recognition
Gender-dependent emotion recognition based on HMMs and SPHMMs

International Journal of Speech Technology
Human emotion recognition from videos using spatio-temporal and audio features

The Visual Computer: International Journal of Computer Graphics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic recognition and understanding of speech are crucial steps towards natural human-machine interaction. Apart from the recognition of the word sequence, the recognition of properties such as prosody, emotion tags or stress tags may be of particular importance in this communication process. This paper discusses the possibilities to recognize emotion from the speech signal, primarily from the viewpoint of automatic speech recognition (ASR). The general focus is on the extraction of acoustic features from the speech signal that can be used for the detection of the emotional state or stress state of the speaker.After the introduction, a short overview of the ASR framework is presented. Next, we discuss the relation between recognition of emotion and ASR, and the different approaches found in the literature that deal with the correspondence between emotions and acoustic features. The conclusion is that automatic emotional tagging of the speech signal is difficult to perform with high accuracy, but prosodic information is nevertheless potentially useful to improve the dialogue handling in ASR tasks on a limited domain.