Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Term selection for searching printed Arabic
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
MOrpho-LEXical Analysis for Correcting OCR-Generated Arabic Words (MOLEX)
IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
Fast methods for kernel-based text analysis
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
OCR error correction using a noisy channel model
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Arabic Natural Language Processing
Arabic Natural Language Processing
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Improvements in BBN's HMM-Based Offline Arabic Handwriting Recognition System
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Hi-index | 0.00 |
Arabic handwriting recognition (HR) is a challenging problem due to Arabic's connected letter forms, consonantal diacritics and rich morphology. In this paper we isolate the task of identification of erroneous words in HR from the task of producing corrections for these words. We consider a variety of linguistic (morphological and syntactic) and non-linguistic features to automatically identify these errors. Our best approach achieves a roughly ~15% absolute increase in F-score over a simple but reasonable baseline. A detailed error analysis shows that linguistic features, such as lemma (i.e., citation form) models, help improve HR-error detection precisely where we expect them to: semantically incoherent error words.