Machine Learning - Special issue on learning with probabilistic representations
Morphological Tagging Approach in Document Analysis of Invoices
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
On the Reading of Tables of Contents
DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
User-Guided Wrapping of PDF Documents Using Graph Matching Techniques
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Structure extraction from PDF-based book documents
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Hi-index | 0.00 |
Repetition of layout structure is prevalent in document images. In document design, such repetition conveys the underlying logical and functional structure of the data. For example, in invoices, the names, unit prices, quantities and other descriptors of every line item are laid out in a consistent spatial structure. We propose a general method for extracting such repeated structure from documents. After receiving a single example of the structure to be found, the proposed method localizes additional instances of this structure in the same document and in additional documents. A wide variety of perceptually motivated cues (such as alignment and saliency) is used for this purpose. These cues are combined in a probabilistic model, and a novel algorithm for exact inference in this model is proposed and used. We demonstrate that this method can cope with complex instances of repeated structure and generalizes successfully across a wide range of structure variations.