Robust text and drawing segmentation algorithm for historical documents

Authors:
Rafi Cohen;Abedelkadir Asi;Klara Kedem;Jihad El-Sana;Itshak Dinstein
Affiliations:
Ben-Gurion University of the Negev;Ben-Gurion University of the Negev;Ben-Gurion University of the Negev;Ben-Gurion University of the Negev;Ben-Gurion University of the Negev
Venue:
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Year:
2013

Citing 18
Cited 0

Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Earth Mover's Distance as a Metric for Image Retrieval

International Journal of Computer Vision
Fast Approximate Energy Minimization via Graph Cuts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Word Spotting: A New Approach to Indexing Handwriting

CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
Efficient Graph-Based Image Segmentation

International Journal of Computer Vision
Lazy snapping

ACM SIGGRAPH 2004 Papers
"GrabCut": interactive foreground extraction using iterated graph cuts

ACM SIGGRAPH 2004 Papers
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
Document zone content classification and its performance evaluation

Pattern Recognition
Text line segmentation of historical documents: a survey

International Journal on Document Analysis and Recognition
Spatial and Spectral Based Segmentation of Text in Multispectral Images of Ancient Documents

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Document image segmentation using discriminative learning over connected components

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
User-assisted alignment of Arabic historical manuscripts

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Learning and incorporating top-down cues in image segmentation

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Fast anisotropic Gauss filtering

IEEE Transactions on Image Processing
SLIC Superpixels Compared to State-of-the-Art Superpixel Methods

IEEE Transactions on Pattern Analysis and Machine Intelligence
Layout Analysis for Arabic Historical Document Images Using Machine Learning

ICFHR '12 Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method to segment historical document images into regions of different content. First, we segment text elements from non-text elements using a binarized version of the document. Then, we refine the segmentation of the non-text regions into drawings, background and noise. At this stage, spatial and color features are exploited to guarantee coherent regions in the final segmentation. Experiments show that the suggested approach achieves better segmentation quality with respect to other methods. We examine the segmentation quality on 252 pages of a historical manuscript, for which the suggested method achieves about 92% and 90% segmentation accuracy of drawings and text elements, respectively.