A System for Effortless Content Annotation to Unfold the Semantics in Videos

  • Authors:
  • Rainer Lienhart

  • Affiliations:
  • -

  • Venue:
  • CBAIVL '00 Proceedings of the IEEE Workshop on Content-based Access of Image and Video Libraries (CBAIVL'00)
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose and investigate a new but simple and natural extension of the way people record video. This extension allows unfolding the semantics of video clips and thus enables a completely new set of applications on raw video footage. Two microphones are connected to a camcorder: a headworn speech input microphone and an environmental microphone. During recording, the cameraman speaks aloud content-descriptive annotations and/or editing commands. Due to the two-microphones setup, the sound of annotations and editing commands can be removed from the environmental audio by adaptive filtering enabling people to play back the video as if there had been no annotations. Simultaneously, these annotations are transcribed to ASCII by means of a standard speech recognition engine. The viability of this approach is demonstrated by means of an important application for video libraries: the automatic abstraction of raw video footage.