Extracting Textual Inserts from Digital Videos

Authors:
A. Miene;G. Ioannidis
Affiliations:
-;-
Venue:
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Year:
2001

Citing 0
Cited 3

Progress in Camera-Based Document Image Analysis

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Segmentation and Recognition of Characters in Scene Images Using Selective Binarization in Color Space and GAT Correlation

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Automatic annotation of geographic maps

ICCHP'06 Proceedings of the 10th international conference on Computers Helping People with Special Needs

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: Textual inserts and closed captures superimposed on digital videos often contain important and exclusive information about the video contents which cannot be found in other information channels like the audio signal or the underlying video stream. Therefore, it is very helpful to extract these textual information automatically and add this information to a video index as generated by video archiving and retrieval systems like e.g. ADViSOR 1 , AVAnTA [5, 9, 10], DiVA [4] or Informedia [7, 14]. Owing to the fact that common OCR systems are restricted to binary images, the video frames have to be preprocessed in order to extract the textual inserts from the image in the background. In this paper we present our approach to the segmentation of textual inserts from digital videos or images, which consists of a region-growing method for color segmentation and a method of separating text regions from background based on character size and alignment constraints. A new method on segmentation refinement taking into account the results of the classification step leads to a significant enhancement of quality of the resulting binary images. The main difficulties in extracting textual inserts from video are caused by the low resolution and quality of digital video material, the high amount of image data, the very complex structured and textured background, and the unknown color, size, and position of the text to be extracted from the image.