Universal Data Capture Technology from Semi-structured Form
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Multi-page document analysis based on format consistency and clustering
International Journal of Computer Applications in Technology
Hi-index | 0.00 |
In this paper we present a method for the logical labelling of physical rectangles, extracted from invoices, based on a Conceptual Model which describes, as generally as possible, the invoice universe. This general knowledge is used in the semi-automatic construction of a model for each class of invoices. Once the model is constructed, it can be applied to understand an invoice instance, whose class is univocally identified by its logo. This approach is used to design a flexible system which is able to learn, from a nucleus of general knowledge, a monotonic set of specific knowledge for each class of invoices (Document Models), in terms of physical coordinates for each rectangle and related semantic label.