Form Reading based on Form-type Identification and Form-data Recognition

  • Authors:
  • Hiroshi Sako;Minenobu Seki;Naohiro Furukawa;Hisashi Ikeda;Atsuhiro Imaizumi

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Form reading technology based on form-typeidentification and form-data recognition is proposed. Thistechnology can solve difficulties in variety for readingdifferent items on fairly large number of different types offorms. The form-type identification consists of two parts:(i) extraction of targets such as important keywords in aform by matching between recogised characters and wordstrings in a keyword dictionary, and (ii) analysis ofpositional or semantic relationship between the targets byconstellation matching between these targets and wordlocation information in the keyword dictionary. The formdatarecognition consists of two parts: (i) extraction of aregion of interest (ROI) contained a character string of theitem by using a layout knowledge of the very form-type,and (ii) character string recognition of the item by usingthe linguistic constraint which can be obtained from acontent knowledge of the form-type. A experiment using642 sample forms with 107 different types in totalconfirmed that the form-type identification method cancorrectly identify 97% of 642 form samples at a rejectionrate 3%. Another experiment confirmed that the form-data recognition method can correctly read 95% of thenumber of items on the form samples.