Generative modeling and classification of dialogs by a low-level turn-taking feature

  • Authors:
  • Marco Cristani;Anna Pesarin;Carlo Drioli;Alessandro Tavano;Alessandro Perina;Vittorio Murino

  • Affiliations:
  • Istituto Italiano di Tecnologia (IIT), Via Morego 30, 16163 Genova, Italy;Dipartimento di Informatica, University of Verona, Strada le Grazie 15, 37134 Verona, Italy;Dipartimento di Informatica, University of Verona, Strada le Grazie 15, 37134 Verona, Italy;Institute of Psychology, University of Leipzig, Seeburgstr. 14-20, 04103 Leipzig, Germany;Microsoft Research, One Microosoft Way, 98052 Redmond, WA;Istituto Italiano di Tecnologia (IIT), Via Morego 30, 16163 Genova, Italy

  • Venue:
  • Pattern Recognition
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

In the last few years, a growing attention has been paid to the problem of human-human communication, trying to devise artificial systems able to mediate a conversational setting between two or more people. In this paper, we propose an automatic system based on a generative structure able to classify dialog scenarios. The generative model is composed by integrating a Gaussian mixture model and a (observed) Markovian influence model, and it is fed with a novel low-level acoustic feature termed steady conversational period (SCP). SCPs are built on duration of continuous slots of silence or speech, taking also into account conversational turn-taking. The interactional dynamics built upon the transitions among SCPs provides a behavioral blueprint of conversational settings without relying on segmental or continuous phonetic features, and may be important for predicting the evolution of typical conversational situations in different dialog scenarios. The model has been tested on an extensive set of real, dyadic and multi-person conversational settings, including a recent dyadic dataset and the AMI meeting corpus. Comparative tests are made using conventional acoustic features and classification methods, showing that the proposed scheme provides superior classification performances for all conversational settings in our datasets. Moreover, we prove that our approach is able to characterize the nature of multi-person conversation (namely, the role of the participants) in a very accurate way, thus demonstrating great versatility.