Temporally consistent caption detection in videos using a spatiotemporal 3D method

Authors:
Dong-Qing Zhang;Sitaram Bhagavathy;Joan Llach
Affiliations:
Thomson Corporate Research, Princeton, NJ;Thomson Corporate Research, Princeton, NJ;Thomson Corporate Research, Princeton, NJ
Venue:
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Year:
2009

Citing 4
Cited 0

Automatic text segmentation and text recognition for video indexing

Multimedia Systems
Automatic Text Extraction from Video for Content-Based Annotation and Retrieval

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 1 - Volume 1
Detecting Video Texts Using Spatial-Temporal Wavelet Transform

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
A spatial-temporal approach for video caption detection and recognition

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Captions are text or logos superimposed on videos during a postproduction process. Caption detection in videos is useful for a variety of applications. For many applications, temporal consistency and stability is very important. Most of the prior work adopts certain post-processing procedures to smooth detected caption bounding boxes over time. Although these approaches mitigate the effect of the temporal inconsistency problem, they are unable to eliminate the problem. In this paper, we present a new caption detection algorithm that detects the 3D bounding boxes of caption regions in spatiotemporal volume space. 2D bounding boxes are then created by slicing the 3D bounding boxes. Since all the 2D bounding boxes corresponding to a caption area are sliced from one 3D bounding box, they are identical over time, thus ensuring temporal consistency of the result. The experiment results show that our new approach not only generates temporally consistent results but also results in higher detection accuracy.