Fast unsupervised alignment of video and text for indexing/names and faces

  • Authors:
  • Subhransu Maji;Ruzena Bajcsy

  • Affiliations:
  • University of California: Berkeley, Berkeley, CA;University of California: Berkeley, Berkeley, CA

  • Venue:
  • Workshop on multimedia information retrieval on The many faces of multimedia semantics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel way of combining weakly associated video/audio and text steams in an unsupervised manner which is faster than conventional speech recognition. The technique aligns audio/video and text streams which will enable video search using the associated text. Multimedia of this form includes news broadcast with summaries, parliament proceedings and court trials with transcripts, sports telecast with text commentary, etc. We also show how we can annotate the video with the names of the person appearing in the video which will allow name based indexing/search. We test the technique on a 80 minute video segment downloaded from the website of the International Court of the Former Yugoslavia, with the corresponding transcripts. The proposed technique achieves 88.49% accuracy on sentence level alignments and 95.5% accuracy on the task of assigning names to faces.