An unsupervised model for joint phrase alignment and extraction

  • Authors:
  • Graham Neubig;Taro Watanabe;Eiichiro Sumita;Shinsuke Mori;Tatsuya Kawahara

  • Affiliations:
  • Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto, Japan and National Institute of Information and Communication Technology, Hikari-dai, Seika-cho, Soraku-gun, Kyoto, Japan;National Institute of Information and Communication Technology, Hikari-dai, Seika-cho, Soraku-gun, Kyoto, Japan;National Institute of Information and Communication Technology, Hikari-dai, Seika-cho, Soraku-gun, Kyoto, Japan;Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto, Japan;Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto, Japan

  • Venue:
  • HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an unsupervised model for joint phrase alignment and extraction using non-parametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.