Hindi handwritten word recognition using HMM and symbol tree

  • Authors:
  • Swapnil Belhe;Chetan Paulzagade;Akash Deshmukh;Saumya Jetley;Kapil Mehrotra

  • Affiliations:
  • Center for Development of Advanced Computing (C-DAC), Pune, India;Center for Development of Advanced Computing (C-DAC), Pune, India;Center for Development of Advanced Computing (C-DAC), Pune, India;Center for Development of Advanced Computing (C-DAC), Pune, India;Center for Development of Advanced Computing (C-DAC), Pune, India

  • Venue:
  • Proceeding of the workshop on Document Analysis and Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The proposed approach performs recognition of online handwritten isolated Hindi words using a combination of HMMs trained on Devanagari symbols and a tree formed by the multiple, possible sequences of recognized symbols. In general, words in Indic languages are composed of a number of aksharas or syllables, which in turn are formed by groups of consonants and vowel modifiers. Segmentation of aksharas is critical to accurate recognition of both recognition primitives as well as the complete word. Also, recognition in itself is an intricate job. This holistic task of akshara segmentation, symbol identification and subsequent word recognition is targeted in our work. It is handled in an integrated segmentation-recognition framework. By making use of online stroke information for postulating symbol candidates and deriving HOG feature set from their image counterparts, the recognition becomes independent of stroke order and stroke shape variations. Thus, the system is well suited to unconstrained handwriting. Data for this work is collected from different parts of India where Hindi language is predominantly in use. Symbols extracted from 60,000 words are used to train and test 140 symbol-HMM models. The system is designed to output one or more candidate words to the user, by tracing multiple tree paths (up to leaf nodes) under the condition that the symbol likelihood (confidence score) at every node is above threshold. Tests performed on 10,000 words yield an accuracy of 89%.