Recognition assistance treating errors in texts acquired from various recognition processes

  • Authors:
  • Gábor Prószéky;Mátyás Naszódi;Balázs Kis

  • Affiliations:
  • MorphoLogic Késmárki, Budapest, Hungary;MorphoLogic Késmárki, Budapest, Hungary;MorphoLogic Késmárki, Budapest, Hungary

  • Venue:
  • COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Texts acquired from recognition sources---continuous speech/handwriting recognition and OCR---generally have three types of errors regardless of the characteristics of the source in particular. The output of the recognition process may be (1) poorly segmented or not segmented at all; (2) containing underspecified symbols (where the recognition process can only indicate that the symbol belongs to a specific group), e.g. shape codes; (3) containing incorrectly identified symbols. The project presented in this paper addresses these errors by developing of a unified linguistic framework called the MorphoLogic Recognition Assistant that provides feedback and corrections for various recognition processes. The framework uses customized morpho-syntactic and syntactic analysis where the lexicons and their alphabets correspond to the symbol set acquired from the recognition process. The successful framework must provide three services: (1) proper disambiguated segmentation, (2) disambiguation for underspecified symbols, (3) correction for incorrectly recognized symbols. The paper outlines the methods of morpho-syntactic and syntactic post-processing currently in use.