Can projected chains in parallel corpora help coreference resolution?

  • Authors:
  • José Guilherme Camargo de Souza;Constantin Orăsan

  • Affiliations:
  • Research Group in Computational Linguistics, University of Wolverhampton, Wolverhampton, UK;Research Group in Computational Linguistics, University of Wolverhampton, Wolverhampton, UK

  • Venue:
  • DAARC'11 Proceedings of the 8th international conference on Anaphora Processing and Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The majority of current coreference resolution systems rely on annotated corpora to train classifiers for this task. However, this is possible only for languages for which annotated corpora are available. This paper presents a system that automatically extracts coreference chains from texts in Portuguese without the need for Portuguese corpora manually annotated with coreferential information. To achieve this, an English coreference resolver is run on the English part of an English-Portuguese parallel corpus. The coreference pairs identified by the resolver are projected to the Portuguese part of the corpus using automatic word alignment. These projected pairs are then used to train the coreference resolver for Portuguese. Evaluation of the system reveals that it does not outperform a head match baseline. This is due to the fact that most of the projected pairs have the same head, which is learnt by the Portuguese classifier. This suggests that a more accurate English coreference resolver is necessary. A better projection algorithm is also likely to improve the performance of the system.