Evaluating cross-language annotation transfer in the MultiSemCor corpus

Authors:
Luisa Bentivogli;Pamela Forner;Emanuele Pianta
Affiliations:
ITC-irst Via Sommarive, Povo - Trento, Italy;ITC-irst Via Sommarive, Povo - Trento, Italy;ITC-irst Via Sommarive, Povo - Trento, Italy
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 8
Cited 5

Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
A systematic comparison of various statistical alignment models

Computational Linguistics
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Inducing multilingual text analysis tools via robust projection across aligned corpora

HLT '01 Proceedings of the first international conference on Human language technology research
Inducing information extraction systems for new languages via cross-language projection

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
An unsupervised method for word sense tagging using parallel corpora

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Sense discrimination with parallel corpora

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Knowledge intensive word alignment with KNOWA

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

A structural approach to the automatic adjudication of word sense disagreements

Natural Language Engineering
Enhancing the Japanese WordNet

ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Bridging languages by SuperSense entity tagging

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
The need for application-dependent WSD strategies: a case study in MT

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
A survey of methods to ease the development of highly multilingual text mining applications

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we illustrate and evaluate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The transfer approach has been tested in the creation of the MultiSemCor corpus, an English/Italian parallel corpus created on the basis of the English SemCor corpus. In MultiSemCor texts are aligned at the word level and semantically annotated with a shared inventory of senses. We present some experiments carried out to evaluate the different steps involved in the methodology. The results of the evaluation suggest that the cross-language annotation transfer methodology is a promising solution allowing for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new (resource-poor) languages with greatly reduced human effort.