Expectations of word sense in parallel corpora

  • Authors:
  • Xuchen Yao;Benjamin Van Durme;Chris Callison-Burch

  • Affiliations:
  • Johns Hopkins University;Johns Hopkins University;Johns Hopkins University

  • Venue:
  • NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a parallel corpus, if two distinct words in language A, a1 and a2, are aligned to the same word b1 in language B, then this might signal that b1 is polysemous, or it might signal a1 and a2 are synonyms. Both assumptions with successful work have been put forward in the literature. We investigate these assumptions, along with other questions of word sense, by looking at sampled parallel sentences containing tokens of the same type in English, asking how often they mean the same thing when they are: 1. aligned to the same foreign type; and 2. aligned to different foreign types. Results for French-English and Chinese-English parallel corpora show similar behavior: Synonymy is only very weakly the more prevalent scenario, where both cases regularly occur.