Word sense disambiguation using a second language monolingual corpus
Computational Linguistics
WordNet: a lexical database for English
Communications of the ACM
A maximum entropy approach to natural language processing
Computational Linguistics
Cross-Language Information Retrieval
Cross-Language Information Retrieval
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Two languages are more informative than one
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Word-sense disambiguation using statistical methods
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Inducing multilingual text analysis tools via robust projection across aligned corpora
HLT '01 Proceedings of the first international conference on Human language technology research
Inducing information extraction systems for new languages via cross-language projection
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Sequential conditional Generalized Iterative Scaling
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An unsupervised method for word sense tagging using parallel corpora
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Language model based arabic word segmentation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
HowtogetaChineseName(Entity): segmentation and combination issues
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Distortion models for statistical machine translation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Mention detection crossing the language barrier
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic tagging of Arabic text: from raw text to base phrase chunks
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
The impact of morphological stemming on Arabic mention detection and coreference resolution
Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Morphology-Based Segmentation Combination for Arabic Mention Detection
ACM Transactions on Asian Language Information Processing (TALIP)
Inducing domain-specific semantic class taggers from (almost) nothing
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Arabic named entity recognition: using features extracted from noisy data
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Improving mention detection robustness to noisy input
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Enhancing mention detection using projection via aligned corpora
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Aligned-Parallel-Corpora Based Semi-Supervised Learning for Arabic Mention Detection
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
In the last two decades, significant effort has been put into annotating linguistic resources in several languages. Despite this valiant effort, there are still many languages left that have only small amounts of such resources. The goal of this article is to present and investigate a method of propagating information (specifically mention detection) from a resource-rich language into a relatively resource-poor language such as Arabic. Part of the investigation is to quantify the contribution of propagating information in different conditions based on the availability of resources in the target language. Experiments on the language pair Arabic-English show that one can achieve relatively decent performance by propagating information from a language with richer resources such as English into Arabic alone (no resources or models in the source language Arabic). Furthermore, results show that propagated features from English do help improve the Arabic system performance even when used in conjunction with all feature types built from the source language. Experiments also show that using propagated features in conjunction with lexically derived features only (as can be obtained directly from a mention annotated corpus) brings the system performance at the one obtained in the target language by using feature derived from many linguistic resources, therefore improving the system when such resources are not available. In addition to Arabic-English language pair, we investigate the effectiveness of our approach on other language pairs such as Chinese-English and Spanish-English.