Elastic net for paralinguistic speech recognition

  • Authors:
  • Pouria Fewzee;Fakhri Karray

  • Affiliations:
  • University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada

  • Venue:
  • Proceedings of the 14th ACM international conference on Multimodal interaction
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given the fact that the length of the feature vector that is being used for the paralinguistic recognition of speech has exceeded some thousands, the importance of a sparse representation of a model becomes notable. The importance of a sparse representation is mainly due to the more interpretability, higher generalization capability, and numerically more efficiency of such a model. In this work, as an endeavor to search for a sparse representation of speech features used for paralinguistic speech modeling, we make use of the elastic net. As for the benchmark, we use the frameworks of the second audio/visual emotion challenge and the Interspeech 2012 speaker trait challenge. Also proposed in this work is the use of part-of-speech tags as syntactic features of speech for emotional speech recognition. Results of this work show that despite the relatively small number of features that is used for the modeling tasks, generalization capability of the suggested models is comparable to those of other models that use thousands of features and more elaborate learning algorithms.