Using information about multi-word expressions for the word-alignment task

  • Authors:
  • Sriram Venkatapathy;Aravind K. Joshi

  • Affiliations:
  • Indian Institute of Information Technology, Hyderabad, India;University of Pennsylvania, PA

  • Venue:
  • MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is well known that multi-word expressions are problematic in natural language processing. In previous literature, it has been suggested that information about their degree of compositionality can be helpful in various applications but it has not been proven empirically. In this paper, we propose a framework in which information about the multi-word expressions can be used in the word-alignment task. We have shown that even simple features like point-wise mutual information are useful for word-alignment task in English-Hindi parallel corpora. The alignment error rate which we achieve (AER = 0.5040) is significantly better (about 10% decrease in AER) than the alignment error rates of the state-of-art models (Och and Ney, 2003) (Best AER = 0.5518) on the English-Hindi dataset.