A System for Effortless Content Annotation to Unfold the Semantics in Videos

Authors:
Rainer Lienhart
Affiliations:
-
Venue:
CBAIVL '00 Proceedings of the IEEE Workshop on Content-based Access of Image and Video Libraries (CBAIVL'00)
Year:
2000

Citing 0
Cited 2

Boom chameleon: simultaneous capture of 3D viewpoint, voice and gesture annotations on a spatially-aware display

Proceedings of the 15th annual ACM symposium on User interface software and technology
Interactive Adaptive Movie Annotation

IEEE MultiMedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose and investigate a new but simple and natural extension of the way people record video. This extension allows unfolding the semantics of video clips and thus enables a completely new set of applications on raw video footage. Two microphones are connected to a camcorder: a headworn speech input microphone and an environmental microphone. During recording, the cameraman speaks aloud content-descriptive annotations and/or editing commands. Due to the two-microphones setup, the sound of annotations and editing commands can be removed from the environmental audio by adaptive filtering enabling people to play back the video as if there had been no annotations. Simultaneously, these annotations are transcribed to ASCII by means of a standard speech recognition engine. The viability of this approach is demonstrated by means of an important application for video libraries: the automatic abstraction of raw video footage.