The Treatment of Compounds in a Morphological Component for Speech Recognition
Natural Language Processing and Speech Technology, Results of the 3rd KONVENS Conference
Language determination: natural language processing from scanned document images
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Modeling content identification from document images
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Industrial applications of unification morphology
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Dialog structure automatic modeling
MICAI'10 Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I
Hi-index | 0.00 |
Texts acquired from recognition sources---continuous speech/handwriting recognition and OCR---generally have three types of errors regardless of the characteristics of the source in particular. The output of the recognition process may be (1) poorly segmented or not segmented at all; (2) containing underspecified symbols (where the recognition process can only indicate that the symbol belongs to a specific group), e.g. shape codes; (3) containing incorrectly identified symbols. The project presented in this paper addresses these errors by developing of a unified linguistic framework called the MorphoLogic Recognition Assistant that provides feedback and corrections for various recognition processes. The framework uses customized morpho-syntactic and syntactic analysis where the lexicons and their alphabets correspond to the symbol set acquired from the recognition process. The successful framework must provide three services: (1) proper disambiguated segmentation, (2) disambiguation for underspecified symbols, (3) correction for incorrectly recognized symbols. The paper outlines the methods of morpho-syntactic and syntactic post-processing currently in use.