NERA: Named Entity Recognition for Arabic

Authors:
Khaled Shaalan;Hafsa Raza
Affiliations:
Faculty of Informatics, The British University in Dubai, P.O. Box 502216, Dubai, United Arab Emirates;Faculty of Informatics, The British University in Dubai, P.O. Box 502216, Dubai, United Arab Emirates
Venue:
Journal of the American Society for Information Science and Technology
Year:
2009

Citing 0
Cited 8

Arabic Natural Language Processing: Challenges and Solutions

ACM Transactions on Asian Language Information Processing (TALIP)
Extracting person names from diverse and noisy OCR text

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
RENAR: A Rule-Based Arabic Named Entity Recognition System

ACM Transactions on Asian Language Information Processing (TALIP)
ZamAn and raqm: extracting temporal and numerical expressions in arabic

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Recognition and translation of Arabic named entities with NooJ using a new representation model

FSMNLP '11 Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing
Arabic entity graph extraction using morphology, finite state machines, and graph transformations

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Integrating rule-based system with classification for arabic named entity recognition

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
A hybrid approach to Arabic named entity recognition

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products revolving around natural language processing tasks. Many researchers have attacked the name identification problem in a variety of languages, but only a few limited research efforts have focused on named entity recognition for Arabic script. This is due to the lack of resources for Arabic named entities and the limited amount of progress made in Arabic natural language processing in general. In this article, we present the results of our attempt at the recognition and extraction of the 10 most important categories of named entities in Arabic script: the person name, location, company, date, time, price, measurement, phone number, ISBN, and file name. We developed the system Named Entity Recognition for Arabic (NERA) using a rule-based approach. The resources created are: a Whitelist representing a dictionary of names, and a grammar, in the form of regular expressions, which are responsible for recognizing the named entities. A filtration mechanism is used that serves two different purposes: (a) revision of the results from a named entity extractor by using metadata, in terms of a Blacklist or rejecter, about ill-formed named entities and (b) disambiguation of identical or overlapping textual matches returned by different name entity extractors to get the correct choice. In NERA, we addressed major challenges posed by NER in the Arabic language arising due to the complexity of the language, peculiarities in the Arabic orthographic system, nonstandardization of the written text, ambiguity, and lack of resources. NERA has been effectively evaluated using our own tagged corpus; it achieved satisfactory results in terms of precision, recall, and F-measure. © 2009 Wiley Periodicals, Inc.