Emotional speech classification using hidden conditional random fields

  • Authors:
  • La The Vinh;Sungyoung Lee;Young-Koo Lee

  • Affiliations:
  • Kyung Hee University, Korea;Kyung Hee University, Korea;Kyung Hee University, Korea

  • Venue:
  • Proceedings of the Second Symposium on Information and Communication Technology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although there have been a great number of papers in the area of emotional speech recognition, most of them contribute to the feature extraction phase. Regarding classification algorithm, hidden Markov model (HMM) is still the most commonly used method. Whereas HMM was pointed out to be less accurate than its discriminative counterpart, the hidden conditional random fields (HCRF) model, for example in phone classification or gesture recognition. Therefore in this study, we investigate the use of the HCRF model in emotional speech classification problem. In our experiments, we extracted Mel-frequency cepstral coefficients (MFCC) features from the well-known Berlin emotional speech dataset (EMO) and eNTERFACE 2005 dataset. After that, we used the 10-fold cross validation rule to train, evaluate and compare our result with that of HMM. The experiments show that HCRF achieves significant improvement (p-value ≤ 0.05) in classification accuracy. In addition, we speed up the training phase of the model by caching the gradient computation. Therefore our computation time is much less than that of the existing methods.