Page segmentation using the description of the background
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
A Ground-Truthing Tool for Layout Analysis Performance Evaluation
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Representation and classification of complex-shaped printed regions using white tiles
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Methodology for Flexible and Efficient Analysis of the Performance of Page Segmentation Algorithms
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
ICDAR 2003 Page Segmentation Competition
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
A ground-truthing engine for proofsetting, publishing, re-purposing and quality assurance
Proceedings of the 2003 ACM symposium on Document engineering
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Relational indexing of vectorial primitives for symbol spotting in line-drawing images
Pattern Recognition Letters
A framework for the assessment of text extraction algorithms on complex colour images
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Computing precision and recall with missing or uncertain ground truth
GREC'11 Proceedings of the 9th international conference on Graphics Recognition: new trends and challenges
Hi-index | 0.00 |
Over the past two decades a significant number of layout analysis (page segmentation and region classification) approaches have been proposed in the literature. Each approach has been devised for and/or evaluated using (usually small) application-specific datasets. While the need for objective performance evaluation of layout analysis algorithms is evident, there does not exist a suitable dataset with ground truth that reflects the realities of everyday documents (widely varying layouts, complex entities, colour, noise etc.). The most significant impediment is the creation of accurate and flexible (in representation) ground truth, a task that is costly and must be carefully designed. This paper discusses the issues related to the design, representation and creation of ground truth in the context of a realistic dataset developed by the authors. The effectiveness of the ground truth discussed in this paper has been successfully shown in its use for two international page segmentation competitions (ICDAR2003 and ICDAR2005).