Fundamentals of speech recognition
Fundamentals of speech recognition
2005 Special Issue: Emotion recognition in human-computer interaction
Neural Networks - Special issue: Emotion and brain
Gender identification using a general audio classifier
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Ensemble methods for spoken emotion recognition in call-centres
Speech Communication
An overview of text-independent speaker recognition: From features to supervectors
Speech Communication
Modulation spectral features for robust far-field speaker identification
IEEE Transactions on Audio, Speech, and Language Processing
Automatic speech emotion recognition using modulation spectral features
Speech Communication
Two stage emotion recognition based on speaking rate
International Journal of Speech Technology
Identifying speakers using their emotion cues
International Journal of Speech Technology
Emotion-State conversion for speaker recognition
ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
Hi-index | 0.00 |
Speaker recognition performance in emotional talking environments is not as high as it is in neutral talking environments. This work focuses on proposing, implementing, and evaluating a new approach to enhance the performance in emotional talking environments. The new proposed approach is based on identifying the unknown speaker using both his/her gender and emotion cues. Both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) have been used as classifiers in this work. This approach has been tested on our collected emotional speech database which is composed of six emotions. The results of this work show that speaker identification performance based on using both gender and emotion cues is higher than that based on using gender cues only, emotion cues only, and neither gender nor emotion cues by 7.22 %, 4.45 %, and 19.56 %, respectively. This work also shows that the optimum speaker identification performance takes place when the classifiers are completely biased towards suprasegmental models and no impact of acoustic models in the emotional talking environments. The achieved average speaker identification performance based on the new proposed approach falls within 2.35 % of that obtained in subjective evaluation by human judges.