Coding, Analysis, Interpretation, and Recognition of Facial Expressions
IEEE Transactions on Pattern Analysis and Machine Intelligence
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Analysis of emotion recognition using facial expressions, speech and multimodal information
Proceedings of the 6th international conference on Multimodal interfaces
Neural Networks - Special issue: Emotion and brain
Audiovisual recognition of spontaneous interest within conversations
Proceedings of the 9th international conference on Multimodal interfaces
User Modeling and User-Adapted Interaction
Lexical Affect Sensing: Are Affect Dictionaries Necessary to Analyze Affect?
ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Image and Vision Computing
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Audio-visual spontaneous emotion recognition
ICMI'06/IJCAI'07 Proceedings of the ICMI 2006 and IJCAI 2007 international conference on Artifical intelligence for human computing
Emotion recognition using a hierarchical binary decision tree approach
Speech Communication
Robust recognition of emotion from speech
IVA'06 Proceedings of the 6th international conference on Intelligent Virtual Agents
Recent innovations in speech-to-text transcription at SRI-ICSI-UW
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
Automatic detection of a user's interest in spoken dialog plays an important role in many applications, such as tutoring systems and customer service systems. In this study, we propose a decision-level fusion approach using acoustic and lexical information to accurately sense a user's interest at the utterance level. Our system consists of three parts: acoustic/prosodic model, lexical model, and a model that combines their decisions for the final output. We use two different regression algorithms to complement each other for the acoustic model. For lexical information, in addition to the bag-of-words model, we propose new features including a level-of-interest value for each word, length information using the number of words, estimated speaking rate, silence in the utterance, and similarity with other utterances. We also investigate the effectiveness of using more automatic speech recognition (ASR) hypotheses (n-best lists) to extract lexical features. The outputs from the acoustic and lexical models are combined at the decision level. Our experiments show that combining acoustic evidence with lexical information improves level-of-interest detection performance, even when lexical features are extracted from ASR output with high word error rate.