Spoken emotion recognition through optimum-path forest classification using glottal features

  • Authors:
  • Alexander I. Iliev;Michael S. Scordilis;João P. Papa;Alexandre X. Falcão

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, USA;Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, USA;Institute of Computing, University of Campinas, Campinas, São Paulo, Brazil;Institute of Computing, University of Campinas, Campinas, São Paulo, Brazil

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new method for the recognition of spoken emotions is presented based on features of the glottal airflow signal. Its effectiveness is tested on the new optimum path classifier (OPF) as well as on six other previously established classification methods that included the Gaussian mixture model (GMM), support vector machine (SVM), artificial neural networks - multi layer perceptron (ANN-MLP), k-nearest neighbor rule (k-NN), Bayesian classifier (BC) and the C4.5 decision tree. The speech database used in this work was collected in an anechoic environment with ten speakers (5M and 5F) each speaking ten sentences in four different emotions: Happy, Angry, Sad, and Neutral. The glottal waveform was extracted from fluent speech via inverse filtering. The investigated features included the glottal symmetry and MFCC vectors of various lengths both for the glottal and the corresponding speech signal. Experimental results indicate that best performance is obtained for the glottal-only features with SVM and OPF generally providing the highest recognition rates, while for GMM or the combination of glottal and speech features performance was relatively inferior. For this text dependent, multi speaker task the top performing classifiers achieved perfect recognition rates for the case of 6th order glottal MFCCs.