A General Approach to Quality Evaluation of Document Segmentation Results
DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
Evaluating SEE - A Benchmarking System for Document Page Segmentation
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
On benchmarking of invoice analysis systems
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Performance comparison of six algorithms for page segmentation
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Hi-index | 0.00 |
We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely on OCR output, our method is region-based: the segmentation output, described as a set of regions together with their types, output order etc., is matched against the pre-stored set of ground-truth regions. Misclassifications, splitting, and merging of regions are among the errors that are detected by the system. Each error is weighted individually for a particular application and a global estimate of segmentation quality is derived. The system can be customized to benchmark specific aspects of segmentation (e.g., headline detection) and according to the type of error correction that might follow (e.g., re-typing). Segmentation ground-truth files are quickly and easily generated and edited using GroundsKeeper, an X-Window based tool that allows one to view a document, manually draw regions (arbitrary polygons) on it, and specify information about each region (e.g., type, parent).