Techniques for Language Identification for Hybrid Arabic-English Document Images

Authors:
Affiliations:
Venue:
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Year:
2001

Citing 0
Cited 4

Script Identification Using Steerable Gabor Filters

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Script and Language Identification in Noisy and Degraded Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Region growing based segmentation algorithm for typewritten and handwritten text recognition

Applied Soft Computing
Bangla/English script identification based on analysis of connected component profiles

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: Because of the different characteristics of Arabic language and Romance and Anglo Saxon languages, recognition of documents written in hybrid of these languages requires that the language of the text to be identified priori to the recognition phase. In this paper, three efficient techniques that can be used to discriminate between text written in Arabic script and text written in English script are presented and evaluated. These techniques addresses the language identification problem on the word level and on textline level. The characteristics of horizontal projection profiles as well as runlength histograms for text written in both languages are the basic features underlying these techniques. Solving this problem is very important in building bilingual document image analysis systems which are capable of processing documents containing hybrid Arabic/Romance and Anglo Saxon languages.