Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
A systematic comparison of various statistical alignment models
Computational Linguistics
Ultraconservative online algorithms for multiclass problems
The Journal of Machine Learning Research
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Idiomatic object usage and support verbs
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
An empirical model of multiword expression decomposability
MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
A discriminative matching approach to word alignment
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A discriminative framework for bilingual word alignment
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Non-projective dependency parsing using spanning tree algorithms
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Statistical ltag parsing
Stepwise mining of multi-word expressions in Hindi
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Extraction of multi-word expressions from small parallel corpora
Natural Language Engineering
Hi-index | 0.00 |
It is well known that multi-word expressions are problematic in natural language processing. In previous literature, it has been suggested that information about their degree of compositionality can be helpful in various applications but it has not been proven empirically. In this paper, we propose a framework in which information about the multi-word expressions can be used in the word-alignment task. We have shown that even simple features like point-wise mutual information are useful for word-alignment task in English-Hindi parallel corpora. The alignment error rate which we achieve (AER = 0.5040) is significantly better (about 10% decrease in AER) than the alignment error rates of the state-of-art models (Och and Ney, 2003) (Best AER = 0.5518) on the English-Hindi dataset.