A hidden Markov model based named entity recognition system: Bengali and Hindi as case studies

  • Authors:
  • Asif Ekbal;Sivaji Bandyopadhyay

  • Affiliations:
  • Computer Science and Engineering Department, Jadavpur University, Kolkata, India;Computer Science and Engineering Department, Jadavpur University, Kolkata, India

  • Venue:
  • PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Named Entity Recognition (NER) has an important role in almost all Natural Language Processing (NLP) application areas including information retrieval, machine translation, question-answering system, automatic summarization etc. This paper reports about the development of a statistical Hidden Markov Model (HMM) based NER system. The system is initially developed for Bengali using a tagged Bengali news corpus, developed from the archive of a leading Bengali newspaper available in the web. The system is trained with a training corpus of 150,000 wordforms, initially tagged with a HMM based part of speech (POS) tagger. Evaluation results of the 10-fold cross validation test yield an average Recall, Precision and F-Score values of 90.2%, 79.48% and 84.5%, respectively. This HMM based NER system is then trained and tested on the Hindi data to show its effectiveness towards the language independent abilities. Experimental results of the 10-fold cross validation test has demonstrated the average Recall, Precision and F-Score values of 82.5%, 74.6% and 78.35%, respectively with 27,151 Hindi wordforms.