Unsupervised event segmentation of news content with multimodal cues

  • Authors:
  • Mattia Broilo;Eric Zavesky;Andrea Basso;Francesco G. B. De Natale

  • Affiliations:
  • DISI Unitn, Trento, Italy;AT&T Labs Research, Middletown, NJ, USA;AT&T Labs Research, Middletown, NJ, USA;DISI Unitn, Trento, Italy

  • Venue:
  • Proceedings of the 3rd international workshop on Automated information extraction in media production
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the age of content snacking and mobisodes (mobile episodes) the paradigm of media consumption is radically changing. Media consumption is moving from monolithic, prepackaged, well-edited, and elaborate content presentation to a continuous feed of brief segments as singleton episodes and few-minutes videos, that are often supported by or initiated via tweets and status updates. In these updates, attention spans are small and the content packaging is less relevant with respect to the dynamic, 'streaming' aspect of information. This trend has a profound influence on the segmentation requirements that are needed to make this stream of possible information. In this paper, we present a novel method to automatically extract structured content, events, where events include major cast interviews, dialogs, background segments, etc. from news video in an unsupervised fashion. Two key ideas differentiate this unsupervised method from the others: the type of information that we use to find events and the method utilized to combine this information for coherent multimedia events. The proposed system exploits audio, visual appearance, detected faces, and mid-level semantic concepts from every video shot, but instead of combining everything together, the framework clusters them independently and by applying coherence rules assembles the multimedia events. Additionally, we discuss the effect of segmentation errors in practical retrieval and content consumption tasks.