A Generic System for Processing Invoices
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
A Two Level Knowledge Approach for Understanding Documents of a Multi-Class Domain
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Near-wordless document structure classification
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
ICAISC'10 Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II
Hi-index | 0.00 |
The purpose of this research is to reverse engineer the process of encoding data in structured documents and subsequently automate the process of extracting it. We assume a broad category of structured documents for processing that goes beyond form processing. In fact, the documents may have flexible layouts and consist of multiple and varying numbers of pages. The data extraction method (DataX) employs general templates generated by the Inductive Template Generator (InTeGen). The InTeGen method utilizes inductive learning from examples of documents with identified data elements. Both methods achieve high automation with minimal user's input.