Language models for online handwritten Tamil word recognition

Authors:
Suresh Sundaram;Bhargava Urala K;A. G. Ramakrishnan
Affiliations:
HP Research Labs, Bangalore, India;Indian Institute of Science, Bangalore, India;Indian Institute of Science, Bangalore, India
Venue:
Proceeding of the workshop on Document Analysis and Recognition
Year:
2012

Citing 14
Cited 1

Statistical methods for speech recognition

Statistical methods for speech recognition
A Study of Representations for Pen Based Handwriting Recognition of Tamil Characters

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Principal Component Analysis for Online Handwritten Character Recognition

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 2 - Volume 02
Comparison of Elastic Matching Algorithms for Online Tamil Handwritten Character Recognition

IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition
Statistical Language Models for On-line Handwritten Sentence Recognition

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
On the Use of Context-Dependent Modeling Units for HMM-Based Offline Handwriting Recognition

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Elastic Matching of Online Handwritten Tamil and Telugu Scripts Using Local Features

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
On-Line Handwriting Recognition System for Tamil Handwritten Characters

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Spatiostructural features for recognition of online handwritten characters in Devanagari and Tamil scripts

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Creation of a Huge Annotated Database for Tamil and Kannada OHR

ICFHR '10 Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition
HMM-Based Lexicon-Driven and Lexicon-Free Word Recognition for Online Handwritten Indic Scripts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Attention-Feedback Based Robust Segmentation of Online Handwritten Isolated Tamil Words

ACM Transactions on Asian Language Information Processing (TALIP)

Global and local features for recognition of online handwritten numerals and Tamil characters

Proceedings of the 4th International Workshop on Multilingual OCR

Quantified Score

Hi-index	0.00

Visualization

Abstract

N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.