Using information about multi-word expressions for the word-alignment task

Authors:
Sriram Venkatapathy;Aravind K. Joshi
Affiliations:
Indian Institute of Information Technology, Hyderabad, India;University of Pennsylvania, PA
Venue:
MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Year:
2006

Citing 12
Cited 2

Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
A systematic comparison of various statistical alignment models

Computational Linguistics
Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Idiomatic object usage and support verbs

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
An empirical model of multiword expression decomposability

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
A discriminative matching approach to word alignment

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A discriminative framework for bilingual word alignment

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Non-projective dependency parsing using spanning tree algorithms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Statistical ltag parsing

Statistical ltag parsing

Stepwise mining of multi-word expressions in Hindi

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Extraction of multi-word expressions from small parallel corpora

Natural Language Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is well known that multi-word expressions are problematic in natural language processing. In previous literature, it has been suggested that information about their degree of compositionality can be helpful in various applications but it has not been proven empirically. In this paper, we propose a framework in which information about the multi-word expressions can be used in the word-alignment task. We have shown that even simple features like point-wise mutual information are useful for word-alignment task in English-Hindi parallel corpora. The alignment error rate which we achieve (AER = 0.5040) is significantly better (about 10% decrease in AER) than the alignment error rates of the state-of-art models (Och and Ney, 2003) (Best AER = 0.5518) on the English-Hindi dataset.