Language identification of on-line documents using word shapes

Authors:
Nicola Nobile;Sabina Bergler;Ching Y. Suen;Sami Khoury
Affiliations:
-;-;-;-
Venue:
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Year:
1997

Citing 0
Cited 5

Language Identification of Character Images Using Machine Learning Techniques

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
MEMPHIS: a mobile agent-based system for enabling acquisition of multilingual content and providing flexible format internet premium services

Journal of Systems Architecture: the EUROMICRO Journal
Script and Language Identification in Noisy and Degraded Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Script and language identification in degraded and distorted document images

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Language identification in degraded and distorted document images

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The authors have extended existing methods to identify the language of an on-line document after the characters have been coded using 10 character classes based on visual characteristics. In particular, they exploit word bigrams and trigrams in both a linear combination of score values and an expert systems approach. Knowledge about each language as acquired from a large number of on-line texts. Using a small set of rules, the expert system outperforms the linear combination in accuracy and shows more stability when parameter settings are varied.