Enhancement of emotion detection in spoken dialogue systems by combining several information sources

Authors:
Ramón López-Cózar;Jan Silovsky;Martin Kroul
Affiliations:
Dept. of Languages and Computer Systems, Faculty of Computer Science, University of Granada, Spain;Institute of Information Technology and Electronics, Faculty of Mechatronics, Technical University of Liberec, Czech Republic;Institute of Information Technology and Electronics, Faculty of Mechatronics, Technical University of Liberec, Czech Republic
Venue:
Speech Communication
Year:
2011

Citing 10
Cited 1

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
How to find trouble in communication

Speech Communication - Special issue on speech and emotion
Emotion Recognition and Its Application to Computer Agents with Spontaneous Interactive Capabilities

ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
Dialogue act modeling for automatic tagging and recognition of conversational speech

Computational Linguistics
What's the trouble: automatically identifying problematic dialogues in DARPA communicator dialogue systems

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Ensemble methods for spoken emotion recognition in call-centres

Speech Communication
Interactive robots as social partners and peer tutors for children: a field trial

Human-Computer Interaction
Quality of Telephone-Based Spoken Dialogue Systems

Quality of Telephone-Based Spoken Dialogue Systems
Combining classifiers with multi-representation of context in word sense disambiguation

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Class-specific multiple classifiers scheme to recognize emotions from speech signals

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a technique to enhance emotion detection in spoken dialogue systems by means of two modules that combine different information sources. The first one, called Fusion-0, combines emotion predictions generated by a set of classifiers that deal with different kinds of information about each sentence uttered by the user. To do this, the module employs several methods for information fusion that produce other predictions about the emotional state of the user. The predictions are the input to the second information fusion module, called Fusion-1, where they are combined to deduce the emotional state of the user. Fusion-0 represents a method employed in previous studies to enhance classification rates, whereas Fusion-1 represents the novelty of the technique, which is the combination of emotion predictions generated by Fusion-0. One advantage of the technique is that it can be applied as a posterior processing stage to any other methods that combine information from different information sources at the decision level. This is so because the technique works on the predictions (outputs) of the methods, without interfering in the procedure used to obtain these predictions. Another advantage is that the technique can be implemented as a modular architecture, which facilitates the setting up within a spoken dialogue system as well as the deduction of the emotional state of the user in real time. Experiments have been carried out considering classifiers to deal with prosodic, acoustic, lexical, and dialogue acts information, and three methods to combine information: multiplication of probabilities, average of probabilities, and unweighted vote. The results show that the technique enhances the classification rates of the standard fusion by 2.27% and 3.38% absolute in experiments carried out considering two and three emotion categories, respectively.