The nature of statistical learning theory
The nature of statistical learning theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A search result clustering method using informatively named entities
Proceedings of the 7th annual ACM international workshop on Web information and data management
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Maximum entropy models for named entity recognition
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improving machine translation quality with automatic named entity recognition
EAMT '03 Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT
Arabic Natural Language Processing: Challenges and Solutions
ACM Transactions on Asian Language Information Processing (TALIP)
Morphology-Based Segmentation Combination for Arabic Mention Detection
ACM Transactions on Asian Language Information Processing (TALIP)
Arabic Mention Detection: toward better unit of analysis
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Arabic named entity recognition: using features extracted from noisy data
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Simplified feature set for Arabic named entity recognition
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Enhancing mention detection using projection via aligned corpora
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Borda-based voting schemes for semantic role labeling
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Automatic rule learning exploiting morphological features for named entity recognition in Turkish
Journal of Information Science
RENAR: A Rule-Based Arabic Named Entity Recognition System
ACM Transactions on Asian Language Information Processing (TALIP)
ZamAn and raqm: extracting temporal and numerical expressions in arabic
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Optimizing CRF-Based model for proper name recognition in polish texts
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Arabic entity graph extraction using morphology, finite state machines, and graph transformations
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Integrating rule-based system with classification for arabic named entity recognition
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Recall-oriented learning of named entities in Arabic Wikipedia
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
A real time Named Entity Recognition system for Arabic text mining
Language Resources and Evaluation
A hybrid approach to Arabic named entity recognition
Journal of Information Science
Hi-index | 0.00 |
The Named Entity Recognition (NER) task has been garnering significant attention in NLP as it helps improve the performance of many natural language processing applications. In this paper, we investigate the impact of using different sets of features in two discriminative machine learning frameworks, namely, Support Vector Machines and Conditional Random Fields using Arabic data. We explore lexical, contextual and morphological features on eight standardized data-sets of different genres. We measure the impact of the different features in isolation, rank them according to their impact for each named entity class and incrementally combine them in order to infer the optimal machine learning approach and feature set. Our system yields a performance of Fβ=1-measure=83.5 on ACE 2003 Broadcast News data.