Recognition assistance treating errors in texts acquired from various recognition processes

Authors:
Gábor Prószéky;Mátyás Naszódi;Balázs Kis
Affiliations:
MorphoLogic Késmárki, Budapest, Hungary;MorphoLogic Késmárki, Budapest, Hungary;MorphoLogic Késmárki, Budapest, Hungary
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Year:
2002

Citing 5
Cited 1

The Treatment of Compounds in a Morphological Component for Speech Recognition

Natural Language Processing and Speech Technology, Results of the 3rd KONVENS Conference
Language determination: natural language processing from scanned document images

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Modeling content identification from document images

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Industrial applications of unification morphology

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A unification-based approach to morpho-syntactic parsing of agglutinative and other (highly) inflectional languages

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Dialog structure automatic modeling

MICAI'10 Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Texts acquired from recognition sources---continuous speech/handwriting recognition and OCR---generally have three types of errors regardless of the characteristics of the source in particular. The output of the recognition process may be (1) poorly segmented or not segmented at all; (2) containing underspecified symbols (where the recognition process can only indicate that the symbol belongs to a specific group), e.g. shape codes; (3) containing incorrectly identified symbols. The project presented in this paper addresses these errors by developing of a unified linguistic framework called the MorphoLogic Recognition Assistant that provides feedback and corrections for various recognition processes. The framework uses customized morpho-syntactic and syntactic analysis where the lexicons and their alphabets correspond to the symbol set acquired from the recognition process. The successful framework must provide three services: (1) proper disambiguated segmentation, (2) disambiguation for underspecified symbols, (3) correction for incorrectly recognized symbols. The paper outlines the methods of morpho-syntactic and syntactic post-processing currently in use.