Recognition of Printed Amharic Documents

Authors:
Million Meshesha;C. V. Jawahar
Affiliations:
International Institute of Information Technology - Hyderabad, India;International Institute of Information Technology - Hyderabad, India
Venue:
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Year:
2005

Citing 5
Cited 3

A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
A Bilingual OCR for Hindi-Telugu Documents and its Applications

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Analysis and Recognition of Asian Scripts - the State of the Art

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
An Optimal Set of Discriminant Vectors

IEEE Transactions on Computers

Mathematical symbol recognition with support vector machines

Pattern Recognition Letters
Offline handwritten Amharic word recognition

Pattern Recognition Letters
A semi-automatic adaptive OCR for digital libraries

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In Africa, there are a number of languages with their own indigenous scripts. This paper presents an OCR for Amharic scripts. Amharic is the oficial and working language of Ethiopia. This is possibly the Jirst attempt towards the development of an OCR system for Amharic. Research in the recognition of Amharic script faces major challenges due to (i) the use of more than 300 characters in writing and (ii) existence of a large set of visually similar characters. In this paper, we propose a two-stage feature extraction scheme using PCA and LDA, followed by a decision DAG classifier with SVMs as the nodes. Recognition results are presented to demonstrate the peformance on the various printing variations Ifonts, styles and sizes) and real-life degraded documents such as books, magazines and newspapers.