A framework for the assessment of text extraction algorithms on complex colour images

Authors:
A. Clavelli;D. Karatzas;J. Lladós
Affiliations:
Universitat Autònoma De Barcelona, Edifici O, Bellaterra, Spain;Universitat Autònoma De Barcelona, Edifici O, Bellaterra, Spain;Universitat Autònoma De Barcelona, Edifici O, Bellaterra, Spain
Venue:
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Year:
2010

Citing 8
Cited 4

Locating and Recognizing Text in WWW Images

Information Retrieval
Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object count/area graphs for the evaluation of object detection and segmentation algorithms

International Journal on Document Analysis and Recognition
Colour text segmentation in web images based on human perception

Image and Vision Computing
An Objective Evaluation Methodology for Document Image Binarization Techniques

DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
Truthing for Pixel-Accurate Segmentation

DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
Text Segmentation in Colour Posters from the Spanish Civil War Era

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Ground truth for layout analysis performance evaluation

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

A platform for storing, visualizing, and interpreting collections of noisy documents

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Pixel accurate document image content extraction

Proceedings of the 2011 ACM Symposium on Applied Computing
Document analysis research in the year 2021

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Multi-script robust reading competition in ICDAR 2013

Proceedings of the 4th International Workshop on Multilingual OCR

Quantified Score

Hi-index	0.00

Visualization

Abstract

The availability of open, ground-truthed datasets and clear performance metrics is a crucial factor in the development of an application domain. The domain of colour text image analysis (real scenes, Web and spam images, scanned colour documents) has traditionally suffered from a lack of a comprehensive performance evaluation framework. Such a framework is extremely difficult to specify, and corresponding pixel-level accurate information tedious to define. In this paper we discuss the challenges and technical issues associated with developing such a framework. Then, we describe a complete framework for the evaluation of text extraction methods at multiple levels, provide a detailed ground-truth specification and present a case study on how this framework can be used in a real-life situation.