Automatic Caption Localization in Compressed Video

Authors:
Yu Zhong;Hongjiang Zhang;Anil K. Jain
Affiliations:
Carnegie Mellon Univ., Pittsburgh, PA;Hewlett Packard Co., Palo Alto, CA;Michigan State Univ., East Lansing
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2000

Citing 13
Cited 52

The JPEG still picture compression standard

Communications of the ACM - Special issue on digital multimedia systems
MPEG: a video compression standard for multimedia applications

Communications of the ACM - Special issue on digital multimedia systems
Text segmentation using Gabor filters for automatic document processing

Machine Vision and Applications - Special issue: document image analysis techniques
Informedia digital video library

MULTIMEDIA '94 Proceedings of the second ACM international conference on Multimedia
Video parsing, retrieval and browsing: an integrated and content-based solution

Proceedings of the third ACM international conference on Multimedia
Video parsing and browsing using compressed data

Multimedia Tools and Applications
A New Methodology for Gray-Scale Character Segmentation and Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Finding text in images

DL '97 Proceedings of the second ACM international conference on Digital libraries
Intelligent Access to Digital Video: Informedia Project

Computer
Recognizing Characters in Scene Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Text Extraction from Video for Content-Based Annotation and Retrieval

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 1 - Volume 1
Indexing Text Events in Digital Video Databases

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 1 - Volume 1
Page segmentation using texture analysis

Pattern Recognition

On the evolution of videotext description scheme and its validation experiments for MPEG-7

MULTIMEDIA '00 Proceedings of the 2000 ACM workshops on Multimedia
Automatic location of text in video frames

MULTIMEDIA '01 Proceedings of the 2001 ACM workshops on Multimedia: multimedia information retrieval
Feature extraction and content analysis for sports videos annotation

MULTIMEDIA '01 Proceedings of the 2001 ACM workshops on Multimedia: multimedia information retrieval
Semantic Annotation of Sports Videos

IEEE MultiMedia
Semantic Characterization of Visual Content for Sports Videos Annotation

MDIC '01 Proceedings of the Second International Workshop on Multimedia Databases and Image Communication
Semantic Annotation and Indexing of News and Sports Videos

SOFSEM '02 Proceedings of the 29th Conference on Current Trends in Theory and Practice of Informatics: Theory and Practice of Informatics
Automatic Closed Caption Detection and Font Size Differentiation in MPEG Video

VISUAL '02 Proceedings of the 5th International Conference on Recent Advances in Visual Information Systems
Motion Activity Based Shot Identification and Closed Caption Detection for Video Structuring

VISUAL '02 Proceedings of the 5th International Conference on Recent Advances in Visual Information Systems
Fast Text Caption Localization on Video Using Visual Rhythm

VISUAL '02 Proceedings of the 5th International Conference on Recent Advances in Visual Information Systems
Text Extraction Based on Nonlinear Frame

WAA '01 Proceedings of the Second International Conference on Wavelet Analysis and Its Applications
Extraction of Text Regions and Recognition of Characters from Video Inputs

PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
A hierarchical access control model for video database systems

ACM Transactions on Information Systems (TOIS)
Progress in Camera-Based Document Image Analysis

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Detection of Text Marks on Moving Vehicles

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Automatic text detection and removal in video sequences

Pattern Recognition Letters
Effective text extraction and recognition for WWW images

Proceedings of the 2003 ACM symposium on Document engineering
Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
HTTP-Proxy-Assisted Automatic Video Indexing for E-Learning

SAINT-W '04 Proceedings of the 2004 Symposium on Applications and the Internet-Workshops (SAINT 2004 Workshops)
Hybrid approach to efficient text extraction in complex color images

Pattern Recognition Letters
Multimodal Video Indexing: A Review of the State-of-the-art

Multimedia Tools and Applications
Semantic and structural analysis of TV diving programs

Journal of Computer Science and Technology
Caption Localisation in Video Sequences by Fusion of Multiple Detectors

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Text Detection in Images Based on Unsupervised Classification of Edge-based Features

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Text Locating from Natural Scene Images Using Image Intensitie

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Content-driven adaptation of on-line video

Image Communication
Text detection, localization, and tracking in compressed video

Image Communication
Detecting text in video frames

SPPR'07 Proceedings of the Fourth conference on IASTED International Conference: Signal Processing, Pattern Recognition, and Applications
Fast and robust object-extraction framework for object-based streaming system

International Journal of Virtual Technology and Multimedia
Structuring low-quality videotaped lectures for cross-reference browsing by video text analysis

Pattern Recognition
A unified framework for document restoration using inpainting and shape-from-shading

Pattern Recognition
Text Detection in Urban Scenes

Proceedings of the 2009 conference on Artificial Intelligence Research and Development: Proceedings of the 12th International Conference of the Catalan Association for Artificial Intelligence
Text Detection in Urban Scenes

Proceedings of the 2009 conference on Artificial Intelligence Research and Development: Proceedings of the 12th International Conference of the Catalan Association for Artificial Intelligence
Video text detection based on filters and edge features

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Fast and robust text detection in images and video frames

Image and Vision Computing
Detecting text in video frames

SPPRA '07 Proceedings of the Fourth IASTED International Conference on Signal Processing, Pattern Recognition, and Applications
Accurate video text detection through classification of low and high contrast images

Pattern Recognition
Text extraction for spam-mail image filtering using a text color estimation technique

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
A robust caption detecting algorithm on MPEG compressed video

MCAM'07 Proceedings of the 2007 international conference on Multimedia content analysis and mining
A model-based iterative method for caption extraction in compressed MPEG video

SAMT'07 Proceedings of the semantic and digital media technologies 2nd international conference on Semantic Multimedia
A two-stage scheme for text detection in video images

Image and Vision Computing
Caption text extraction for indexing purposes using a hierarchical region-based image model

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
New approach based on texture and geometric features for text detection

ICISP'10 Proceedings of the 4th international conference on Image and signal processing
A novel mutual nearest neighbor based symmetry for text frame classification in video

Pattern Recognition
Automatic parsing of sports videos with grammars

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Text detection in images based on color texture features

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
A new text detection algorithm in images/video frames

PCM'04 Proceedings of the 5th Pacific Rim Conference on Advances in Multimedia Information Processing - Volume Part II
Fast rotation-invariant video caption detection based on visual rhythm

CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Text extraction from videos using a hybrid approach

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
A general Framework of video segmentation to logical unit based on conditional random fields

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Scale based region growing for scene text detection

Proceedings of the 21st ACM international conference on Multimedia
A framework for improved video text detection and recognition

Multimedia Tools and Applications
Transform invariant text extraction

The Visual Computer: International Journal of Computer Graphics

Quantified Score

Hi-index	0.14

Visualization

Abstract

We present a method to automatically localize captions in JPEG compressed images and the I-frames of MPEG compressed videos. Caption text regions are segmented from background images using their distinguishing texture characteristics. Unlike previously published methods which fully decompress the video sequence before extracting the text regions, this method locates candidate caption text regions directly in the DCT compressed domain using the intensity variation information encoded in the DCT domain. Therefore, only a very small amount of decoding is required. The proposed algorithm takes about $0.006$ second to process a $240 \times 350$ image and achieves a recall rate of $99.17$ percent while falsely accepting about $1.87$ percent nontext DCT blocks on a variety of MPEG compressed videos containing more than $2,300$ I-frames.