A maximum entropy approach to natural language processing
Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Rational Design for a Weighted Finite-State Transducer Library
WIA '97 Revised Papers from the Second International Workshop on Implementing Automata
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Sequential conditional Generalized Iterative Scaling
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Language model based arabic word segmentation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
HowtogetaChineseName(Entity): segmentation and combination issues
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Combination of Arabic preprocessing schemes for statistical machine translation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Arabic named entity recognition using optimized feature sets
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic tagging of Arabic text: from raw text to base phrase chunks
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
The impact of morphological stemming on Arabic mention detection and coreference resolution
Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Cross-Language Information Propagation for Arabic Mention Detection
ACM Transactions on Asian Language Information Processing (TALIP)
Arabic Mention Detection: toward better unit of analysis
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
The Arabic language has a very rich/complex morphology. Each Arabic word is composed of zero or more prefixes, one stem and zero or more suffixes. Consequently, the Arabic data is sparse compared to other languages such as English, and it is necessary to conduct word segmentation before any natural language processing task. Therefore, the word-segmentation step is worth a deeper study since it is a preprocessing step which shall have a significant impact on all the steps coming afterward. In this article, we present an Arabic mention detection system that has very competitive results in the recent Automatic Content Extraction (ACE) evaluation campaign. We investigate the impact of different segmentation schemes on Arabic mention detection systems and we show how these systems may benefit from more than one segmentation scheme. We report the performance of several mention detection models using different kinds of possible and known segmentation schemes for Arabic text: punctuation separation, Arabic Treebank, and morphological and character-level segmentations. We show that the combination of competitive segmentation styles leads to a better performance. Results indicate a statistically significant improvement when Arabic Treebank and morphological segmentations are combined.