Rapid development of Hindi named entity recognition using conditional random fields and feature induction

  • Authors:
  • Wei Li;Andrew McCallum

  • Affiliations:
  • University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes our application of conditional random fields with feature induction to a Hindi named entity recognition task. With only five days development time and little knowledge of this language, we automatically discover relevant features by providing a large array of lexical tests and using feature induction to automatically construct the features that most increase conditional likelihood. In an effort to reduce overfitting, we use a combination of a Gaussian prior and early stopping based on the results of 10-fold cross validation.