Think globally, apply locally: using distributional characteristics for Hindi named entity identification

  • Authors:
  • Shalini Gupta;Pushpak Bhattacharyya

  • Affiliations:
  • IIT Bombay, Mumbai, India;IIT Bombay, Mumbai, India

  • Venue:
  • NEWS '10 Proceedings of the 2010 Named Entities Workshop
  • Year:
  • 2010

Quantified Score

Hi-index 0.02

Visualization

Abstract

In this paper, we present a novel approach for Hindi Named Entity Identification (NEI) in a large corpus. The key idea is to harness the global distributional characteristics of the words in the corpus. We show that combining the global distributional characteristics along with the local context information improves the NEI performance over statistical baseline systems that employ only local context. The improvement is very significant (about 10%) in scenarios where the test and train corpus belong to different genres. We also propose a novel measure for NEI based on term informativeness and show that it is competitive with the best measure and better than other well known information measures.