Neural network based feature transformation for emotion independent speaker identification

  • Authors:
  • Sreenivasa Rao Krothapalli;Jaynath Yadav;Sourjya Sarkar;Shashidhar G. Koolagudi;Anil Kumar Vuppala

  • Affiliations:
  • School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;IIIT, LTRC, Hyderabad, India

  • Venue:
  • International Journal of Speech Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions during conversations for effectively conveying the messages. Therefore, in this work we propose the speaker recognition system, robust to variations in emotional moods of speakers. Neural network models are explored to transform the speaker specific spectral features from any specific emotion to neutral. In this work, we have considered eight emotions namely, Anger, Sad, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise. The emotional databases developed in Hindi, Telugu and German are used in this work for analyzing the effect of proposed feature transformation on the performance of speaker identification system. In this work, spectral features are represented by mel-frequency cepstral coefficients, and speaker models are developed using Gaussian mixture models. Performance of the speaker identification system is analyzed with various feature mapping techniques. Results have demonstrated that the proposed neural network based feature transformation has improved the speaker identification performance by 20 %. Feature transformation at the syllable level has shown the better performance, compared to sentence level.