A Semi-supervised Approach for Maximum Entropy Based Hindi Named Entity Recognition

  • Authors:
  • Sujan Kumar Saha;Pabitra Mitra;Sudeshna Sarkar

  • Affiliations:
  • Indian Institute of Technology, Kharagpur, India 721302;Indian Institute of Technology, Kharagpur, India 721302;Indian Institute of Technology, Kharagpur, India 721302

  • Venue:
  • PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scarcity of annotated data is a challenge in building high performance named entity recognition (NER) systems in resource poor languages. We use a semi-supervised approach which uses a small annotated corpus and a large raw corpus for the Hindi NER task using maximum entropy classifier. A novel statistical annotation confidence measure is proposed for the purpose. The confidence measure is used in selective sampling based semi-supervised NER. Also a prior modulation of maximum entropy classifier is used where the annotation confidence values are used as `prior weight'. The superiority of the proposed technique over baseline classifier is demonstrated extensively through experiments.