A Novel Video Caption Detection Approach Using Multi-Frame Integration

Authors:
Rongrong Wang;Wanjun Jin;Lide Wu
Affiliations:
Fudan University, Shanghai, China;Fudan University, Shanghai, China;Fudan University, Shanghai, China
Venue:
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Year:
2004

Citing 0
Cited 8

Text detection, localization, and tracking in compressed video

Image Communication
A Novel Video Text Detection and Localization Approach

PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Localizing and Extracting Caption in News Video Using Multi-Frame Average

Proceedings of the 2008 conference on New Trends in Multimedia and Network Information Systems
Extracting text information for content-based video retrieval

MMM'08 Proceedings of the 14th international conference on Advances in multimedia modeling
Precise news video text detection/localization based on multiple frames integration

ISCGAV'10 Proceedings of the 10th WSEAS international conference on Signal processing, computational geometry and artificial vision
A video text detection method based on key text points

PCM'10 Proceedings of the 11th Pacific Rim conference on Advances in multimedia information processing: Part I
Localization and recognition of the scoreboard in sports video based on SIFT point matching

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part II
Robust news video text detection based on edges and line-deletion

WSEAS Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Captions in videos often play an important role in video information indexing and retrieval. In this paper, we present a novel video caption detection approach. We first apply a new Multiple Frames Integration (MFI) method to minimize the variation of the background of the image. A time-based minimum (or maximum)pixel value search is employed and Sobel edge map is used to determine the mode of search. Then block-based text detection is performed, i.e. a small window is used to scan the image and classified as text or non-text, using Sobel edges as features. We use a two-level pyramid to detect various text sizes. Finally, we present a new iterative text line decomposition method and accurate text bounding boxes are extracted from candidate text areas. Experimental result shows that the proposed approach achieves a high precision and recall.