A real time Named Entity Recognition system for Arabic text mining

  • Authors:
  • Harith Al-Jumaily;Paloma Martínez;José L. Martínez-Fernández;Erik Goot

  • Affiliations:
  • Computer Science Department, Carlos III University of Madrid, Leganés, Madrid, Spain 28911;Computer Science Department, Carlos III University of Madrid, Leganés, Madrid, Spain 28911;DAEDALUS --- Data, Decisions and Language S.A., Madrid, Spain 28031;EC Joint Research Centre, Ispra, Italy 27549

  • Venue:
  • Language Resources and Evaluation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Arabic is the most widely spoken language in the Arab World. Most people of the Islamic World understand the Classic Arabic language because it is the language of the Qur'an. Despite the fact that in the last decade the number of Arabic Internet users (Middle East and North and East of Africa) has increased considerably, systems to analyze Arabic digital resources automatically are not as easily available as they are for English. Therefore, in this work, an attempt is made to build a real time Named Entity Recognition system that can be used in web applications to detect the appearance of specific named entities and events in news written in Arabic. Arabic is a highly inflectional language, thus we will try to minimize the impact of Arabic affixes on the quality of the pattern recognition model applied to identify named entities. These patterns are built up by processing and integrating different gazetteers, from DBPedia ( http://dbpedia.org/About , 2009) to GATE (A general architecture for text engineering, 2009) and ANERGazet ( http://users.dsic.upv.es/grupos/nle/?file=kop4.php ).