Truthing for Pixel-Accurate Segmentation

Authors:
Michael A. Moll;Henry S. Baird;Chang An
Affiliations:
-;-;-
Venue:
DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
Year:
2008

Citing 0
Cited 3

A framework for the assessment of text extraction algorithms on complex colour images

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Document image segmentation using discriminative learning over connected components

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
MAST: multi-script annotation toolkit for scenic text

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss problems in developing policies for ground truthing document imagesfor pixel-accurate segmentation. First, we describe ground truthing policies that apply to four different scales: (1) paragraph, (2) text line, (3) character, and (4) pixel. We then analyze difficult and/or ambiguous cases that will challenge any policy, e.g. blank space, overlapping content, etc. Experiments have shown the benefit of using "tighter'' zones that capture more detail (e.g., at the text line level, instead of paragraph). We show that tighter ground truth does significantly improve classification results, by 45% in recent experiments. It is important to face the fact that a pixel-accurate segmentation can be better than manually obtained ground truth. In practice, perfectly accurate pixel-level ground truth may not be achievable of course, but we believe it is important to explore methods to semi-automatically improve existing ground truth.