Bimodal log-linear regression for fusion of audio and visual features

  • Authors:
  • Ognjen Rudovic;Stavros Petridis;Maja Pantic

  • Affiliations:
  • Imperial College London, London, United Kingdom;Imperial College London, London, United Kingdom;Imperial College London - Univ. Twente, London, United Kingdom

  • Venue:
  • Proceedings of the 21st ACM international conference on Multimedia
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the most commonly used audiovisual fusion approaches is feature-level fusion where the audio and visual features are concatenated. Although this approach has been successfully used in several applications, it does not take into account interactions between the features, which can be a problem when one and/or both modalities have noisy features. In this paper, we investigate whether feature fusion based on explicit modelling of interactions between audio and visual features can enhance the performance of the classifier that performs feature fusion using simple concatenation of the audio-visual features. To this end, we propose a log-linear model, named Bimodal Log-linear regression, which accounts for interactions between the features of the two modalities. The performance of the target classifiers is measured in the task of laughter-vs-speech discrimination, since both laughter and speech are naturally audiovisual events. Our experiments on the MAHNOB laughter database suggest that feature fusion based on explicit modelling of interactions between the audio-visual features leads to an improvement of 3\% over the standard feature concatenation approach, when log-linear model is used as the base classifier. Finally, the most and least influential features can be easily identified by observing their interactions.