Generative modeling and classification of dialogs by a low-level turn-taking feature

Authors:
Marco Cristani;Anna Pesarin;Carlo Drioli;Alessandro Tavano;Alessandro Perina;Vittorio Murino
Affiliations:
Istituto Italiano di Tecnologia (IIT), Via Morego 30, 16163 Genova, Italy;Dipartimento di Informatica, University of Verona, Strada le Grazie 15, 37134 Verona, Italy;Dipartimento di Informatica, University of Verona, Strada le Grazie 15, 37134 Verona, Italy;Institute of Psychology, University of Leipzig, Seeburgstr. 14-20, 04103 Leipzig, Germany;Microsoft Research, One Microosoft Way, 98052 Redmond, WA;Istituto Italiano di Tecnologia (IIT), Via Morego 30, 16163 Genova, Italy
Venue:
Pattern Recognition
Year:
2011

Citing 25
Cited 3

A Hierarchical Latent Variable Model for Data Visualization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones

Machine Learning
Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Temporal granulation and its application to signal analysis

Information Sciences—Informatics and Computer Science: An International Journal
The role of voice quality in communicating emotion, mood and attitude

Speech Communication - Special issue on speech and emotion
Coupled hidden Markov models for complex action recognition

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Dialogue act modeling for automatic tagging and recognition of conversational speech

Computational Linguistics
Automatic Analysis of Multimodal Group Actions in Meetings

IEEE Transactions on Pattern Analysis and Machine Intelligence
Socially Aware Computation and Communication

Computer
Automatic detection of group functional roles in face to face interactions

Proceedings of the 8th international conference on Multimodal interfaces
Social signals, their function, and automatic analysis: a survey

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Social correlates of turn-taking behavior

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
A finite-state turn-taking model for spoken dialog systems

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Modeling vocal interaction for text-independent participant characterization in multi-party conversation

SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
A privacy-sensitive approach to modeling multi-person conversations

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Modeling dominance in group conversations using nonverbal activity cues

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Guest editorial: special issue on human computing

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Dominance detection in meetings using easily obtainable features

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Estimating Dominance in Multi-Party Meetings Using Speaker Diarization

IEEE Transactions on Audio, Speech, and Language Processing
Modeling individual and group actions in meetings with layered HMMs

IEEE Transactions on Multimedia
Speakers Role Recognition in Multiparty Audio Recordings Using Social Network Analysis and Duration Distribution Modeling

IEEE Transactions on Multimedia
Group Interaction Analysis in Dynamic Context

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Audio-Assisted Movie Dialogue Detection

IEEE Transactions on Circuits and Systems for Video Technology

Multimodal prediction of expertise and leadership in learning groups

Proceedings of the 1st International Workshop on Multimodal Learning Analytics
Human behavior analysis in video surveillance: A Social Signal Processing perspective

Neurocomputing
Generative modelling of dyadic conversations: characterization of pragmatic skills during development age

MPRSS'12 Proceedings of the First international conference on Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction

Quantified Score

Hi-index	0.01

Visualization

Abstract

In the last few years, a growing attention has been paid to the problem of human-human communication, trying to devise artificial systems able to mediate a conversational setting between two or more people. In this paper, we propose an automatic system based on a generative structure able to classify dialog scenarios. The generative model is composed by integrating a Gaussian mixture model and a (observed) Markovian influence model, and it is fed with a novel low-level acoustic feature termed steady conversational period (SCP). SCPs are built on duration of continuous slots of silence or speech, taking also into account conversational turn-taking. The interactional dynamics built upon the transitions among SCPs provides a behavioral blueprint of conversational settings without relying on segmental or continuous phonetic features, and may be important for predicting the evolution of typical conversational situations in different dialog scenarios. The model has been tested on an extensive set of real, dyadic and multi-person conversational settings, including a recent dyadic dataset and the AMI meeting corpus. Comparative tests are made using conventional acoustic features and classification methods, showing that the proposed scheme provides superior classification performances for all conversational settings in our datasets. Moreover, we prove that our approach is able to characterize the nature of multi-person conversation (namely, the role of the participants) in a very accurate way, thus demonstrating great versatility.