Combination of generative models and SVM based classifier for speech emotion recognition

Authors:
S. Chandrakala;C. Chandra Sekhar
Affiliations:
Department of Computer Science and Engineering, lIT Madras, Chennai, India;Department of Computer Science and Engineering, lIT Madras, Chennai, India
Venue:
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Year:
2009

Citing 19
Cited 2

An HMM-Based Approach for Off-Line Unconstrained Handwritten Word Modeling and Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System
Content-Based Classification, Search, and Retrieval of Audio

IEEE MultiMedia
Dissimilarity representations allow for building good classifiers

Pattern Recognition Letters
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Emotions, speech and the ASR framework

Speech Communication - Special issue on speech and emotion
The production and recognition of emotions in speech: features and algorithms

International Journal of Human-Computer Studies - Application of affective computing in human—Computer interaction
On the structure of hidden Markov models

Pattern Recognition Letters
Online Handwriting Recognition for Tamil

IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition
Sequence-similarity kernels for SVMs to detect anomalies in system calls

Neurocomputing
An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech

Speech Communication
Structure-Based Statistical Features and Multivariate Time Series Clustering

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Emotion Recognition Based on Physiological Changes in Music Listening

IEEE Transactions on Pattern Analysis and Machine Intelligence
Model Based Clustering of Audio Clips Using Gaussian Mixture Models

ICAPR '09 Proceedings of the 2009 Seventh International Conference on Advances in Pattern Recognition
Variational Gaussian Mixture Models for Speech Emotion Recognition

ICAPR '09 Proceedings of the 2009 Seventh International Conference on Advances in Pattern Recognition
Maximum entropy direct models for speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Temporal Feature Integration for Music Genre Classification

IEEE Transactions on Audio, Speech, and Language Processing
Content-based audio classification and retrieval by support vector machines

IEEE Transactions on Neural Networks

Classification of Multi-variate Varying Length Time Series Using Descriptive Statistical Features

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Class-specific GMM based intermediate matching kernel for classification of varying length patterns of long duration speech using support vector machines

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modeling time series data of varying length is important in different domains. There are two paradigms for modeling the varying length sequential data. Tasks such as speech recognition need modeling the temporal dynamics and the correlations among the features. Hidden Markov models (HMM) are used for these tasks. In tasks such as speaker recognition, audio classification and speech emotion recognition, modeling the temporal dynamics is not critical. Gaussian mixture models (GMM) are commonly used for these tasks. Generative models such as HMMs and GMMs focus on estimating the density of the data and are not suitable for classifying the data of confusable classes. Discriminative classifiers such as support vector machines (SVM) are suitable for the fixed dimensional patterns. In this paper, we propose a hybrid framework where a generative front end is used for representing the varying length time series data and then a discriminative model is used for classification. A score based approach and a segment modeling based approach are proposed in this framework. Both the approaches are applied for speech emotion recognition. The performance is compared with that of an SVM classifier that uses different statistical features and also with that of the GMM classifiers that use maximum likelihood method and the variational Bayes method for parameter estimation. Both the proposed approaches outperform the methods used for comparison.