Language Identification of Character Images Using Machine Learning Techniques
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Journal of Systems Architecture: the EUROMICRO Journal
Script and Language Identification in Noisy and Degraded Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Script and language identification in degraded and distorted document images
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Language identification in degraded and distorted document images
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Hi-index | 0.00 |
The authors have extended existing methods to identify the language of an on-line document after the characters have been coded using 10 character classes based on visual characteristics. In particular, they exploit word bigrams and trigrams in both a linear combination of score values and an expert systems approach. Knowledge about each language as acquired from a large number of on-line texts. Using a small set of rules, the expert system outperforms the linear combination in accuracy and shows more stability when parameter settings are varied.