Learning common grammar from multilingual corpus

  • Authors:
  • Tomoharu Iwata;Daichi Mochihashi;Hiroshi Sawada

  • Affiliations:
  • NTT Communication Science Laboratories, Soraku-gun, Kyoto, Japan;NTT Communication Science Laboratories, Soraku-gun, Kyoto, Japan;NTT Communication Science Laboratories, Soraku-gun, Kyoto, Japan

  • Venue:
  • ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic context-free grammar (PCFG), and these PCFGs are generated from a prior grammar that is common across languages. We also develop a variational method for efficient inference. Experiments on a non-parallel multilingual corpus of eleven languages demonstrate the feasibility of the proposed method.