A comprehensive neural-based approach for text recognition in videos using natural language processing

Authors:
Khaoula Elagouni;Christophe Garcia;Pascale Sébillot
Affiliations:
Orange Labs R&D, rue du Clos Courtel, Cesson-Séévigné Cedex, France;LIRIS, Insa de Lyon, Bât. Jules Verne, Villeurbanne Cedex, France;IRISA, Insa de Rennes, Campus de Beaulieu, Rennes Cedex, France
Venue:
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Year:
2011

Citing 11
Cited 1

Connectives and quantifiers in fuzzy sets

Fuzzy Sets and Systems - Special memorial volume on foundations of fuzzy reasoning
A Survey of Methods and Strategies in Character Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A New Methodology for Gray-Scale Character Segmentation and Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Convolutional networks for images, speech, and time series

The handbook of brain theory and neural networks
Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Convolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiple moving object detection for fast video content description in compressed domain

EURASIP Journal on Advances in Signal Processing
Integrated Image and Speech Analysis for Content-Based Video Indexing

ICMCS '96 Proceedings of the 1996 International Conference on Multimedia Computing and Systems
Using Multiple Frame Integration for the Text Recognition of Video

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Multimedia event-based video indexing using time intervals

IEEE Transactions on Multimedia
Efficient video indexing scheme for content-based retrieval

IEEE Transactions on Circuits and Systems for Video Technology

Text recognition in videos using a recurrent connectionist approach

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work aims at helping multimedia content understanding by deriving benefit from textual clues embedded in digital videos. For this, we developed a complete video Optical Character Recognition system (OCR), specifically adapted to detect and recognize embedded texts in videos. Based on a neural approach, this new method outperforms related work, especially in terms of robustness to style and size variabilities, to background complexity and to low resolution of the image. A language model that drives several steps of the video OCR is also introduced in order to remove ambiguities due to a local letter by letter recognition and to reduce segmentation errors. This approach has been evaluated on a database of French TV news videos and achieves an outstanding character recognition rate of 95%, corresponding to 78% of words correctly recognized, which enables its incorporation into an automatic video indexing and retrieval system.