Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments

Authors:
Ismail Mohd Shahin
Affiliations:
Department of Electrical and Computer Engineering, University of Sharjah, Sharjah, United Arab Emirates
Venue:
International Journal of Speech Technology
Year:
2013

Citing 12
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
2005 Special Issue: Emotion recognition in human-computer interaction

Neural Networks - Special issue: Emotion and brain
Gender identification using a general audio classifier

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Ensemble methods for spoken emotion recognition in call-centres

Speech Communication
Multistyle classification of speech under stress using feature subset selection based on genetic algorithms

Speech Communication
Speaker identification in the shouted environment using Suprasegmental Hidden Markov Models

Signal Processing
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
Modulation spectral features for robust far-field speaker identification

IEEE Transactions on Audio, Speech, and Language Processing
Automatic speech emotion recognition using modulation spectral features

Speech Communication
Two stage emotion recognition based on speaking rate

International Journal of Speech Technology
Identifying speakers using their emotion cues

International Journal of Speech Technology
Emotion-State conversion for speaker recognition

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speaker recognition performance in emotional talking environments is not as high as it is in neutral talking environments. This work focuses on proposing, implementing, and evaluating a new approach to enhance the performance in emotional talking environments. The new proposed approach is based on identifying the unknown speaker using both his/her gender and emotion cues. Both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) have been used as classifiers in this work. This approach has been tested on our collected emotional speech database which is composed of six emotions. The results of this work show that speaker identification performance based on using both gender and emotion cues is higher than that based on using gender cues only, emotion cues only, and neither gender nor emotion cues by 7.22 %, 4.45 %, and 19.56 %, respectively. This work also shows that the optimum speaker identification performance takes place when the classifiers are completely biased towards suprasegmental models and no impact of acoustic models in the emotional talking environments. The achieved average speaker identification performance based on the new proposed approach falls within 2.35 % of that obtained in subjective evaluation by human judges.