An Optimization Methodology for Document Structure Extraction on Latin Character Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Consensus-Based Table Form Recognition
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
EURASIP Journal on Applied Signal Processing
Hi-index | 0.00 |
This paper introduces scoring methods developed to automatically assess the performance of document recognition systems, specifically, to evaluate the spatial correspondence of zones produced by a document segmentor. Two different approaches are discussed. The first approach (based on zone overlap and nearest-neighbors) is better applied to merged zones, whereas the second approach (based on zone alignments) is better applied to nested zones (such as those found in tables and graphs). Definitions of coverage and efficiency error are presented, and scoring results on real system output is provided that validates the usefulness of these methods to compare different document recognition algorithms. Currently, no standard testing procedures exist for measuring and comparing algorithms within a complex document recognition system. Scoring methods, like the ones introduced in this paper, serve as design and validations tools, expediting the development and deployment of document analysis technology for system developers and end users.