Fundamentals of speech recognition
Fundamentals of speech recognition
State of the art in continuous speech recognition
Voice communication between humans and machines
Affective computing
Computational models of the prosody/syntax mapping for spoken language systems
Computational models of the prosody/syntax mapping for spoken language systems
Predicting automatic speech recognition performance using prosodic cues
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Automatic discrimination between laughter and speech
Speech Communication
Computers in Entertainment (CIE) - Interactive TV
Combination of generative models and SVM based classifier for speech emotion recognition
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Spoken emotion recognition through optimum-path forest classification using glottal features
Computer Speech and Language
EURASIP Journal on Audio, Speech, and Music Processing
Identifying speakers using their emotion cues
International Journal of Speech Technology
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
International Journal of Speech Technology
Toward emotional speaker recognition: framework and preliminary results
CCBR'12 Proceedings of the 7th Chinese conference on Biometric Recognition
Gender-dependent emotion recognition based on HMMs and SPHMMs
International Journal of Speech Technology
Human emotion recognition from videos using spatio-temporal and audio features
The Visual Computer: International Journal of Computer Graphics
Hi-index | 0.00 |
Automatic recognition and understanding of speech are crucial steps towards natural human-machine interaction. Apart from the recognition of the word sequence, the recognition of properties such as prosody, emotion tags or stress tags may be of particular importance in this communication process. This paper discusses the possibilities to recognize emotion from the speech signal, primarily from the viewpoint of automatic speech recognition (ASR). The general focus is on the extraction of acoustic features from the speech signal that can be used for the detection of the emotional state or stress state of the speaker.After the introduction, a short overview of the ASR framework is presented. Next, we discuss the relation between recognition of emotion and ASR, and the different approaches found in the literature that deal with the correspondence between emotions and acoustic features. The conclusion is that automatic emotional tagging of the speech signal is difficult to perform with high accuracy, but prosodic information is nevertheless potentially useful to improve the dialogue handling in ASR tasks on a limited domain.