Mention detection crossing the language barrier

Authors:
Imed Zitouni;Radu Florian
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2008

Citing 19
Cited 12

Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
WordNet: a lexical database for English

Communications of the ACM
A maximum entropy approach to natural language processing

Computational Linguistics
Cross-Language Information Retrieval

Cross-Language Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Representing text chunks

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Two languages are more informative than one

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Inducing multilingual text analysis tools via robust projection across aligned corpora

HLT '01 Proceedings of the first international conference on Human language technology research
Inducing information extraction systems for new languages via cross-language projection

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Sequential conditional Generalized Iterative Scaling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An unsupervised method for word sense tagging using parallel corpora

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Language model based arabic word segmentation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
HowtogetaChineseName(Entity): segmentation and combination issues

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Distortion models for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The impact of morphological stemming on Arabic mention detection and coreference resolution

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

When Harry met Harri: cross-lingual name spelling normalization

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Can one language bootstrap the other: a case study on event extraction

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Cross-Language Information Propagation for Arabic Mention Detection

ACM Transactions on Asian Language Information Processing (TALIP)
Arabic named entity recognition: using features extracted from noisy data

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Improving mention detection robustness to noisy input

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Enhancing mention detection using projection via aligned corpora

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A cross-lingual annotation projection approach for relation detection

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Technical trend analysis by analyzing research papers' titles

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Translation-based projection for multilingual coreference resolution

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A graph-based cross-lingual projection approach for weakly supervised relation extraction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Cross-Lingual Annotation Projection for Weakly-Supervised Relation Extraction

ACM Transactions on Asian Language Information Processing (TALIP)
Aligned-Parallel-Corpora Based Semi-Supervised Learning for Arabic Mention Detection

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

While significant effort has been put into annotating linguistic resources for several languages, there are still many left that have only small amounts of such resources. This paper investigates a method of propagating information (specifically mention detection information) into such low resource languages from richer ones. Experiments run on three language pairs (Arabic-English, Chinese-English, and Spanish-English) show that one can achieve relatively decent performance by propagating information from a language with richer resources such as English into a foreign language alone (no resources or models in the foreign language). Furthermore, while examining the performance using various degrees of linguistic information in a statistical framework, results show that propagated features from English help improve the source-language system performance even when used in conjunction with all feature types built from the source language. The experiments also show that using propagated features in conjunction with lexically-derived features only (as can be obtained directly from a mention annotated corpus) yields similar performance to using feature types derived from many linguistic resources.