Arabic named entity recognition: using features extracted from noisy data

  • Authors:
  • Yassine Benajiba;Imed Zitouni;Mona Diab;Paolo Rosso

  • Affiliations:
  • Columbia University;IBM T.J. Watson Research Center, Yorktown Heights;Columbia University;Universidad Politécnica de Valencia

  • Venue:
  • ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Building an accurate Named Entity Recognition (NER) system for languages with complex morphology is a challenging task. In this paper, we present research that explores the feature space using both gold and bootstrapped noisy features to build an improved highly accurate Arabic NER system. We bootstrap noisy features by projection from an Arabic-English parallel corpus that is automatically tagged with a baseline NER system. The feature space covers lexical, morphological, and syntactic features. The proposed approach yields an improvement of up to 1.64 F-measure (absolute).