Automatic detection of speaker state: Lexical, prosodic, and phonetic approaches to level-of-interest and intoxication classification

Authors:
William Yang Wang;Fadi Biadsy;Andrew Rosenberg;Julia Hirschberg
Affiliations:
Department of Computer Science, Columbia University, United States;Department of Computer Science, Columbia University, United States;Computer Science Department, Queens College (CUNY), United States;Department of Computer Science, Columbia University, United States
Venue:
Computer Speech and Language
Year:
2013

Citing 11
Cited 1

On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving pseudo-relevance feedback in web information retrieval using web page segmentation

WWW '03 Proceedings of the 12th international conference on World Wide Web
A prosodic analysis of discourse segments in direction-giving monologues

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Predicting student emotions in computer-human tutoring dialogues

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Contextual phrase-level polarity analysis using lexical affect scoring and syntactic N-grams

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
"Got you!": automatic vandalism detection in Wikipedia with web-based shallow syntactic-semantic modeling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Detecting levels of interest from spoken dialog with multistream prediction feedback and similarity based hierarchical fusion learning

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Using performance trajectories to analyze the immediate impact of user state misclassification in an adaptive spoken dialogue system

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference

"Love ya, jerkface": using sparse log-linear models to build positive (and impolite) relationships with teens

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional studies of speaker state focus primarily upon one-stage classification techniques using standard acoustic features. In this article, we investigate multiple novel features and approaches to two recent tasks in speaker state detection: level-of-interest (LOI) detection and intoxication detection. In the task of LOI prediction, we propose a novel Discriminative TFIDF feature to capture important lexical information and a novel Prosodic Event detection approach using AuToBI; we combine these with acoustic features for this task using a new multilevel multistream prediction feedback and similarity-based hierarchical fusion learning approach. Our experimental results outperform published results of all systems in the 2010 Interspeech Paralinguistic Challenge - Affect Subchallenge. In the intoxication detection task, we evaluate the performance of Prosodic Event-based, phone duration-based, phonotactic, and phonetic-spectral based approaches, finding that a combination of the phonotactic and phonetic-spectral approaches achieve significant improvement over the 2011 Interspeech Speaker State Challenge - Intoxication Subchallenge baseline. We discuss our results using these new features and approaches and their implications for future research.