Universal Data Capture Technology from Semi-structured Form

  • Authors:
  • Diar Tuganbaev;Aram Pakhchanian;Dmitry Deryagin

  • Affiliations:
  • ABBYY Software House, Moscow;ABBYY Software House, Moscow;ABBYY Software House, Moscow

  • Venue:
  • ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a universal technology for automated data capture from documents with similar data but different layouts, such as invoices, claim forms, résumés, contracts, loan documents, etc. Prior to data capture, the relevant data are detected on the document image. A formalization of top-down document analysis is suggested and a language for describing document structures is presented. Formalized descriptions in this language can be compiled into executable code. The process of matching such formalized descriptions with actual semi-structured documents in order to find the relevant data is described.