Level of interest sensing in spoken dialog using decision-level fusion of acoustic and lexical evidence

Authors:
Je Hun Jeon;Rui Xia;Yang Liu
Affiliations:
-;-;-
Venue:
Computer Speech and Language
Year:
2014

Citing 15
Cited 0

Coding, Analysis, Interpretation, and Recognition of Facial Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Analysis of emotion recognition using facial expressions, speech and multimodal information

Proceedings of the 6th international conference on Multimodal interfaces
2005 Special Issue: Emotion recognition through facial expression analysis based on a neurofuzzy network

Neural Networks - Special issue: Emotion and brain
Audiovisual recognition of spontaneous interest within conversations

Proceedings of the 9th international conference on Multimodal interfaces
Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech

User Modeling and User-Adapted Interaction
Lexical Affect Sensing: Are Affect Dictionaries Necessary to Analyze Affect?

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Being bored? Recognising natural interest by extensive audiovisual integration for real-life application

Image and Vision Computing
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Audio-visual spontaneous emotion recognition

ICMI'06/IJCAI'07 Proceedings of the ICMI 2006 and IJCAI 2007 international conference on Artifical intelligence for human computing
Emotion recognition using a hierarchical binary decision tree approach

Speech Communication
Robust recognition of emotion from speech

IVA'06 Proceedings of the 6th international conference on Intelligent Virtual Agents
Recent innovations in speech-to-text transcription at SRI-ICSI-UW

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic detection of a user's interest in spoken dialog plays an important role in many applications, such as tutoring systems and customer service systems. In this study, we propose a decision-level fusion approach using acoustic and lexical information to accurately sense a user's interest at the utterance level. Our system consists of three parts: acoustic/prosodic model, lexical model, and a model that combines their decisions for the final output. We use two different regression algorithms to complement each other for the acoustic model. For lexical information, in addition to the bag-of-words model, we propose new features including a level-of-interest value for each word, length information using the number of words, estimated speaking rate, silence in the utterance, and similarity with other utterances. We also investigate the effectiveness of using more automatic speech recognition (ASR) hypotheses (n-best lists) to extract lexical features. The outputs from the acoustic and lexical models are combined at the decision level. Our experiments show that combining acoustic evidence with lexical information improves level-of-interest detection performance, even when lexical features are extracted from ASR output with high word error rate.