Cross-Language Information Propagation for Arabic Mention Detection

Authors:
Imed Zitouni;Radu Florian
Affiliations:
IBM T. J. Watson Research Center;IBM T. J. Watson Research Center
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2009

Citing 21
Cited 5

Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
WordNet: a lexical database for English

Communications of the ACM
A maximum entropy approach to natural language processing

Computational Linguistics
Cross-Language Information Retrieval

Cross-Language Information Retrieval
Representing text chunks

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Two languages are more informative than one

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Inducing multilingual text analysis tools via robust projection across aligned corpora

HLT '01 Proceedings of the first international conference on Human language technology research
Inducing information extraction systems for new languages via cross-language projection

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Sequential conditional Generalized Iterative Scaling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An unsupervised method for word sense tagging using parallel corpora

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Language model based arabic word segmentation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
HowtogetaChineseName(Entity): segmentation and combination issues

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Distortion models for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Mention detection crossing the language barrier

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic tagging of Arabic text: from raw text to base phrase chunks

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
The impact of morphological stemming on Arabic mention detection and coreference resolution

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Morphology-Based Segmentation Combination for Arabic Mention Detection

ACM Transactions on Asian Language Information Processing (TALIP)

Inducing domain-specific semantic class taggers from (almost) nothing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Arabic named entity recognition: using features extracted from noisy data

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Improving mention detection robustness to noisy input

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Enhancing mention detection using projection via aligned corpora

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Aligned-Parallel-Corpora Based Semi-Supervised Learning for Arabic Mention Detection

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last two decades, significant effort has been put into annotating linguistic resources in several languages. Despite this valiant effort, there are still many languages left that have only small amounts of such resources. The goal of this article is to present and investigate a method of propagating information (specifically mention detection) from a resource-rich language into a relatively resource-poor language such as Arabic. Part of the investigation is to quantify the contribution of propagating information in different conditions based on the availability of resources in the target language. Experiments on the language pair Arabic-English show that one can achieve relatively decent performance by propagating information from a language with richer resources such as English into Arabic alone (no resources or models in the source language Arabic). Furthermore, results show that propagated features from English do help improve the Arabic system performance even when used in conjunction with all feature types built from the source language. Experiments also show that using propagated features in conjunction with lexically derived features only (as can be obtained directly from a mention annotated corpus) brings the system performance at the one obtained in the target language by using feature derived from many linguistic resources, therefore improving the system when such resources are not available. In addition to Arabic-English language pair, we investigate the effectiveness of our approach on other language pairs such as Chinese-English and Spanish-English.