Information Retrieval
Classifying Amharic news text using self-organizing maps
Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Automatic diacritic restoration for resource-scarce languages
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Structural and syntactic techniques for recognition of ethiopic characters
SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Hi-index | 0.00 |
The Amharic language is the principal language ofover 20 million people mainly in Ethiopia. An extensiveliterature survey reveals no journal or conference paperson Amharic character recognition. The Amharic scripthas 33 basic characters each with seven orders giving231 distinct characters, not including numbers andpunctuation symbols. The characters are cursive but notconnected and unlike other cursive scripts do not usedots.This paper describes the Amharic script anddiscusses the difficulties of applying conventionalstructural and syntactic recognition processes. Twostatistical algorithms for identifying Amharic charactersare described. In both, the characters are normalised forboth size and orientation. The first compares thecharacter against a series of templates. The secondderives a characteristic signature from the character andcompares this against a set of signature templates. Thesignatures used are fifty times smaller than the originalcharacter and the recognition process is correspondingfaster but with some loss of accuracy. The statisticaltechniques described have been fully implemented and theresulting performance outlined.