Word-Wise Thai and Roman Script Identification

Authors:
Sukalpa Chanda;Umapada Pal;Oriol Ramos Terrades
Affiliations:
Indian Statistical Institute;Indian Statistical Institute;Univ. Politécnica De Valencia
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2009

Citing 16
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
An improved document skew angle estimation technique

Pattern Recognition Letters
Determination of the Script and Language Content of Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Script Identification From Document Images Using Cluster-Based Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Rotation Invariant Texture Features and Their Use in Automatic Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Touching numeral segmentation using water reservoir concept

Pattern Recognition Letters
Classification of Oriental and European Scripts by Using Characteristic Features

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Multi-Script Line identification from Indian Documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Texture for Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Identifying Script onWord-Level with Informational Confidenc

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Script Identification Based on Morphological Reconstruction in Document Images

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
SVM Based Scheme for Thai and English Script Identification

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
Script and Language Identification in Noisy and Degraded Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Perspective rectification of document images using fuzzy set and morphological operations

Image and Vision Computing
Bangla/English script identification based on analysis of connected component profiles

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In some Thai documents, a single text line of a printed document page may contain words of both Thai and Roman scripts. For the Optical Character Recognition (OCR) of such a document page it is better to identify, at first, Thai and Roman script portions and then to use individual OCR systems of the respective scripts on these identified portions. In this article, an SVM-based method is proposed for identification of word-wise printed Roman and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of a character group combining different character features obtained from structural shape, profile behavior, component overlapping information, topological properties, and water reservoir concept, etc. Based on the experiment on 10,000 data (words) we obtained 99.62% script identification accuracy from the proposed scheme.