An Automatic Video Text Detection, Localization and Extraction Approach

Authors:
Chengjun Zhu;Yuanxin Ouyang;Lei Gao;Zhenyong Chen;Zhang Xiong
Affiliations:
School of Computer Science and Technology, Beihang University, Beijing, P.R.China;School of Computer Science and Technology, Beihang University, Beijing, P.R.China;School of Computer Science and Technology, Beihang University, Beijing, P.R.China;School of Computer Science and Technology, Beihang University, Beijing, P.R.China;School of Computer Science and Technology, Beihang University, Beijing, P.R.China
Venue:
Advanced Internet Based Systems and Applications
Year:
2009

Citing 8
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Techniques and Systems for Image and Video Retrieval

IEEE Transactions on Knowledge and Data Engineering
Text Extraction, Enhancement and OCR in Digital Video

DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
Video OCR: indexing digital new libraries by recognition of superimposed captions

Multimedia Systems - Special section on video libraries
Automatic Performance Evaluation for Video Text Detection

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification

IEEE Transactions on Knowledge and Data Engineering
A comprehensive method for multilingual video text detection, localization, and extraction

IEEE Transactions on Circuits and Systems for Video Technology
A spatial-temporal approach for video caption detection and recognition

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text in video is a very compact and accurate clue for video indexing and summarization. This paper presents an algorithm regarding word group as a special symbol to detect, localize and extract video text using support vector machine (SVM) automatically. First, four sobel operators are applied to get the EM(edge map) of the video frame and the EM is segmented into N×2N size blocks. Then character features and characters group structure features are extracted to construct a 19-dimension feature vector. We use a pre-trained SVM to partition each block into two classes: text and non-text blocks. Secondly a dilatation-shrink process is employed to adjust the text position. Finally text regions are enhanced by multiple frame information. After binarization of enhanced text region, the text region with clean background is recognized by OCR software. Experimental results show that the proposed method can detect, localize, and extract video texts with high accuracy.