Arabic named entity recognition using optimized feature sets

  • Authors:
  • Yassine Benajiba;Mona Diab;Paolo Rosso

  • Affiliations:
  • Universidad Politécnica de Valencia;Columbia University;Universidad Politécnica de Valencia

  • Venue:
  • EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Named Entity Recognition (NER) task has been garnering significant attention in NLP as it helps improve the performance of many natural language processing applications. In this paper, we investigate the impact of using different sets of features in two discriminative machine learning frameworks, namely, Support Vector Machines and Conditional Random Fields using Arabic data. We explore lexical, contextual and morphological features on eight standardized data-sets of different genres. We measure the impact of the different features in isolation, rank them according to their impact for each named entity class and incrementally combine them in order to infer the optimal machine learning approach and feature set. Our system yields a performance of Fβ=1-measure=83.5 on ACE 2003 Broadcast News data.