Modelling and analyzing multimodal dyadic interactions using social networks

  • Authors:
  • Sergio Escalera;Petia Radeva;Jordi Vitrià;Xavier Baró;Bogdan Raducanu

  • Affiliations:
  • Universitat de Barcelona, Barcelona, Spain and Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain;Universitat de Barcelona, Barcelona, Spain and Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain;Universitat de Barcelona, Barcelona, Spain and Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain;Universitat Oberta de Catalunya, Barcelona, Spain and Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain;Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain

  • Venue:
  • International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Social network analysis became a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. First, speech detection is performed through an audio/visual fusion scheme based on stacked sequential learning. In the audio domain, speech is detected through clusterization of audio features. Clusters are modelled by means of an One-state Hidden Markov Model containing a diagonal covariance Gaussian Mixture Model. In the visual domain, speech detection is performed through differential-based feature extraction from the segmented mouth region, and a dynamic programming matching procedure. Second, in order to model the dyadic interactions, we employed the Influence Model whose states encode the previous integrated audio/visual data. Third, the social network is extracted based on the estimated influences. For our study, we used a set of videos belonging to New York Times' Blogging Heads opinion blog. The results are reported both in terms of accuracy of the audio/visual data fusion and centrality measures used to characterize the social network.