Speaker state recognition using an HMM-based feature extraction method

Authors:
R. GajšEk;F. Mihelič;S. DobrišEk
Affiliations:
University of Ljubljana, Faculty of Electrical Engineering, Traška Cesta 25, 1000 Ljubljana, Slovenia;University of Ljubljana, Faculty of Electrical Engineering, Traška Cesta 25, 1000 Ljubljana, Slovenia;University of Ljubljana, Faculty of Electrical Engineering, Traška Cesta 25, 1000 Ljubljana, Slovenia
Venue:
Computer Speech and Language
Year:
2013

Citing 11
Cited 0

Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Improvements to Platt's SMO Algorithm for SVM Classifier Design

Neural Computation
Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech

User Modeling and User-Adapted Interaction
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech

Computer Speech and Language
Opensmile: the munich versatile and fast open-source audio feature extractor

Proceedings of the international conference on Multimedia
Multi-modal Emotion Recognition Using Canonical Correlations and Acoustic Features

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Speaker adaptation based on MAP estimation of HMM parameters

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Speech Communication
Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article we present an efficient approach to modeling the acoustic features for the tasks of recognizing various paralinguistic phenomena. Instead of the standard scheme of adapting the Universal Background Model (UBM), represented by the Gaussian Mixture Model (GMM), normally used to model the frame-level acoustic features, we propose to represent the UBM by building a monophone-based Hidden Markov Model (HMM). We present two approaches: transforming the monophone-based segmented HMM-UBM to a GMM-UBM and proceeding with the standard adaptation scheme, or to perform the adaptation directly on the HMM-UBM. Both approaches give superior results than the standard adaptation scheme (GMM-UBM) in both the emotion recognition task and the alcohol detection task. Furthermore, with the proposed method we were able to achieve better results than the current state-of-the-art systems in both tasks.