Using deep morphology to improve automatic error detection in Arabic handwriting recognition

Authors:
Nizar Habash;Ryan M. Roth
Affiliations:
Columbia University;Columbia University
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Year:
2011

Citing 11
Cited 0

Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction

Computational Linguistics
Term selection for searching printed Arabic

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
MOrpho-LEXical Analysis for Correcting OCR-Generated Arabic Words (MOLEX)

IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
Fast methods for kernel-based text analysis

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
OCR error correction using a noisy channel model

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Arabic Natural Language Processing

Arabic Natural Language Processing
Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Arabic OCR error correction using character segment correction, language modeling, and shallow morphology

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Improvements in BBN's HMM-Based Offline Arabic Handwriting Recognition System

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Arabic handwriting recognition (HR) is a challenging problem due to Arabic's connected letter forms, consonantal diacritics and rich morphology. In this paper we isolate the task of identification of erroneous words in HR from the task of producing corrections for these words. We consider a variety of linguistic (morphological and syntactic) and non-linguistic features to automatically identify these errors. Our best approach achieves a roughly ~15% absolute increase in F-score over a simple but reasonable baseline. A detailed error analysis shows that linguistic features, such as lemma (i.e., citation form) models, help improve HR-error detection precisely where we expect them to: semantically incoherent error words.