Arabic named entity recognition using optimized feature sets

Authors:
Yassine Benajiba;Mona Diab;Paolo Rosso
Affiliations:
Universidad Politécnica de Valencia;Columbia University;Universidad Politécnica de Valencia
Venue:
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2008

Citing 7
Cited 16

The nature of statistical learning theory

The nature of statistical learning theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A search result clustering method using informatively named entities

Proceedings of the 7th annual ACM international workshop on Web information and data management
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Maximum entropy models for named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improving machine translation quality with automatic named entity recognition

EAMT '03 Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT

Arabic Natural Language Processing: Challenges and Solutions

ACM Transactions on Asian Language Information Processing (TALIP)
Morphology-Based Segmentation Combination for Arabic Mention Detection

ACM Transactions on Asian Language Information Processing (TALIP)
Arabic Mention Detection: toward better unit of analysis

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Arabic named entity recognition: using features extracted from noisy data

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Simplified feature set for Arabic named entity recognition

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Enhancing mention detection using projection via aligned corpora

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Borda-based voting schemes for semantic role labeling

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Automatic rule learning exploiting morphological features for named entity recognition in Turkish

Journal of Information Science
RENAR: A Rule-Based Arabic Named Entity Recognition System

ACM Transactions on Asian Language Information Processing (TALIP)
ZamAn and raqm: extracting temporal and numerical expressions in arabic

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Optimizing CRF-Based model for proper name recognition in polish texts

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Arabic entity graph extraction using morphology, finite state machines, and graph transformations

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Integrating rule-based system with classification for arabic named entity recognition

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Recall-oriented learning of named entities in Arabic Wikipedia

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
A real time Named Entity Recognition system for Arabic text mining

Language Resources and Evaluation
A hybrid approach to Arabic named entity recognition

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Named Entity Recognition (NER) task has been garnering significant attention in NLP as it helps improve the performance of many natural language processing applications. In this paper, we investigate the impact of using different sets of features in two discriminative machine learning frameworks, namely, Support Vector Machines and Conditional Random Fields using Arabic data. We explore lexical, contextual and morphological features on eight standardized data-sets of different genres. We measure the impact of the different features in isolation, rank them according to their impact for each named entity class and incrementally combine them in order to infer the optimal machine learning approach and feature set. Our system yields a performance of Fβ=1-measure=83.5 on ACE 2003 Broadcast News data.