Arabic Natural Language Processing: Challenges and Solutions
ACM Transactions on Asian Language Information Processing (TALIP)
Extracting person names from diverse and noisy OCR text
AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
RENAR: A Rule-Based Arabic Named Entity Recognition System
ACM Transactions on Asian Language Information Processing (TALIP)
ZamAn and raqm: extracting temporal and numerical expressions in arabic
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Recognition and translation of Arabic named entities with NooJ using a new representation model
FSMNLP '11 Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing
Arabic entity graph extraction using morphology, finite state machines, and graph transformations
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Integrating rule-based system with classification for arabic named entity recognition
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
A hybrid approach to Arabic named entity recognition
Journal of Information Science
Hi-index | 0.00 |
Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products revolving around natural language processing tasks. Many researchers have attacked the name identification problem in a variety of languages, but only a few limited research efforts have focused on named entity recognition for Arabic script. This is due to the lack of resources for Arabic named entities and the limited amount of progress made in Arabic natural language processing in general. In this article, we present the results of our attempt at the recognition and extraction of the 10 most important categories of named entities in Arabic script: the person name, location, company, date, time, price, measurement, phone number, ISBN, and file name. We developed the system Named Entity Recognition for Arabic (NERA) using a rule-based approach. The resources created are: a Whitelist representing a dictionary of names, and a grammar, in the form of regular expressions, which are responsible for recognizing the named entities. A filtration mechanism is used that serves two different purposes: (a) revision of the results from a named entity extractor by using metadata, in terms of a Blacklist or rejecter, about ill-formed named entities and (b) disambiguation of identical or overlapping textual matches returned by different name entity extractors to get the correct choice. In NERA, we addressed major challenges posed by NER in the Arabic language arising due to the complexity of the language, peculiarities in the Arabic orthographic system, nonstandardization of the written text, ambiguity, and lack of resources. NERA has been effectively evaluated using our own tagged corpus; it achieved satisfactory results in terms of precision, recall, and F-measure. © 2009 Wiley Periodicals, Inc.