Identifying idiomatic expressions using phrase alignments in bilingual parallel corpus

Authors:
Hyoung-Gyu Lee;Min-Jeong Kim;Gumwon Hong;Sang-Bum Kim;Young-Sook Hwang;Hae-Chang Rim
Affiliations:
Department of Computer and Radio Communications Engineering, Korea University, Seoul, Korea;Department of Computer and Radio Communications Engineering, Korea University, Seoul, Korea;Department of Computer and Radio Communications Engineering, Korea University, Seoul, Korea;Convergence Technology Center, SK Telecom;Convergence Technology Center, SK Telecom;Department of Computer and Radio Communications Engineering, Korea University, Seoul, Korea
Venue:
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Year:
2010

Citing 9
Cited 0

A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A systematic comparison of various statistical alignment models

Computational Linguistics
Automatic identification of non-compositional phrases

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A phrase-based, joint probability model for statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Unsupervised type and token identification of idiomatic expressions

Computational Linguistics
The complexity of phrase alignment problems

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Classifier combination for contextual idiom detection without labelled data

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous efforts to identify idiomatic expressions using a bilingual parallel corpus have focused on the method of using word alignments to catch the sense of individual words. In this paper, we propose a method of using phrase alignments rather than word alignments in a parallel corpus to recognize the sense of phrases as well as words. Our proposed scoring functions are based on the difference of translation tendency between a phrase and individual words. They can help us identify idiomatic expressions with a entropy variation and a translation difference between a phrase and individualwords. Experimental results show that our proposed method is more effective than previous approaches for the identification of idiomatic expressions. In addition, we proved that linguistic constraints can be integrated into our method to improve the performance.