Maximum echo-state-likelihood networks for emotion recognition

Authors:
Edmondo Trentin;Stefan Scherer;Friedhelm Schwenker
Affiliations:
Dipartimento di Ingegneria dell’Informazione, Università degli studi di Siena, Siena, Italy;Institute of Neural Information Processing, Ulm University, Ulm, Germany;Institute of Neural Information Processing, Ulm University, Ulm, Germany
Venue:
ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Year:
2010

Citing 4
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Real-Time Emotion Recognition from Speech Using Echo State Networks

ANNPR '08 Proceedings of the 3rd IAPR workshop on Artificial Neural Networks in Pattern Recognition
RASTA-PLP speech analysis technique

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Emotion recognition is a relevant task in human-computer interaction. Several pattern recognition and machine learning techniques have been applied so far in order to assign input audio and/or video sequences to specific emotional classes. This paper introduces a novel approach to the problem, suitable also to more generic sequence recognition tasks. The approach relies on the combination of the recurrent reservoir of an echo state network with a connectionist density estimation module. The reservoir realizes an encoding of the input sequences into a fixed-dimensionality pattern of neuron activations. The density estimator, consisting of a constrained radial basis functions network, evaluates the likelihood of the echo state given the input. Unsupervised training is accomplished within a maximum-likelihood framework. The architecture can then be used for estimating class-conditional probabilities in order to carry out emotion classification within a Bayesian setup. Preliminary experiments in emotion recognition from speech signals from the WaSeP© dataset show that the proposed approach is effective, and it may outperform state-of-the-art classifiers.