Text recognition in videos using a recurrent connectionist approach

Authors:
Khaoula Elagouni;Christophe Garcia;Franck Mamalet;Pascale Sébillot
Affiliations:
Orange Labs R&D, Cesson Sévigné, France,IRISA, INSA de Rennes, Rennes, France;LIRIS, INSA de Lyon, Villeurbane, France;Orange Labs R&D, Cesson Sévigné, France;IRISA, INSA de Rennes, Rennes, France
Venue:
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Year:
2012

Citing 10
Cited 0

A Survey of Methods and Strategies in Character Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Convolutional networks for images, speech, and time series

The handbook of brain theory and neural networks
Automatic text segmentation and text recognition for video indexing

Multimedia Systems
Learning precise timing with lstm recurrent networks

The Journal of Machine Learning Research
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

ICML '06 Proceedings of the 23rd international conference on Machine learning
Long Short-Term Memory

Neural Computation
A Novel Connectionist System for Unconstrained Handwriting Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Using Multiple Frame Integration for the Text Recognition of Video

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
A comprehensive neural-based approach for text recognition in videos using natural language processing

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Combining Multi-scale Character Recognition and Linguistic Knowledge for Natural Scene Text OCR

DAS '12 Proceedings of the 2012 10th IAPR International Workshop on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most OCR (Optical Character Recognition) systems developed to recognize texts embedded in multimedia documents segment the text into characters before recognizing them. In this paper, we propose a novel approach able to avoid any explicit character segmentation. Using a multi-scale scanning scheme, texts extracted from videos are first represented by sequences of learnt features. Obtained representations are then used to feed a connectionist recurrent model specifically designed to take into account dependencies between successive learnt features and to recognize texts. The proposed video OCR evaluated on a database of TV news videos achieves very high recognition rates. Experiments also demonstrate that, for our recognition task, learnt feature representations perform better than hand-crafted features.