Unsupervised multilingual grammar induction

Authors:
Benjamin Snyder;Tahira Naseem;Regina Barzilay
Affiliations:
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Year:
2009

Citing 19
Cited 18

Alignment of trees: an alternative to tree edit

Theoretical Computer Science
Inducing Probabilistic Grammars by Bayesian Model Merging

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
A systematic comparison of various statistical alignment models

Computational Linguistics
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
Machine translation with a stochastic grammatical channel

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A generative constituent-context model for improved grammar induction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Multitext Grammars and synchronous parsers

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Learning non-isomorphic tree mappings for machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
The unsupervised learning of natural language structure

The unsupervised learning of natural language structure
Bootstrapping parsers via syntactic projection across parallel texts

Natural Language Engineering
Experiments in parallel-text based grammar induction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Stochastic lexicalized inversion transduction grammar for alignment

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A backoff model for bootstrapping resources for non-English languages

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Inducing a multilingual dictionary from a parallel multitext in related languages

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Two languages are better than one (for syntactic parsing)

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised multilingual learning for POS tagging

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Adding more languages improves unsupervised multilingual part-of-speech tagging: a Bayesian non-parametric approach

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Parser adaptation and projection with quasi-synchronous grammar features

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Joint parsing and alignment with weakly synchronized grammars

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Phylogenetic grammar induction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Learning common grammar from multilingual corpus

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Identifying patterns for unsupervised grammar induction

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Learning better monolingual models with unannotated bilingual text

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
What's with the attitude?: identifying sentences with attitude in online discussions

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Nonparametric word segmentation for machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Unsupervised part-of-speech tagging with bilingual graph-based projections

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Using large monolingual and bilingual corpora to improve coordination disambiguation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised multilingual learning

Unsupervised multilingual learning
Unsupervised structure prediction with non-parallel multilingual guidance

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Inducing sentence structure from parallel corpora for reordering

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Universal morphological analysis using structured nearest neighbor prediction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Relaxed cross-lingual projection of constituent syntax

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Selective sharing for multilingual dependency parsing

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Crosslingual induction of semantic roles

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Universal grapheme-to-phoneme prediction over Latin alphabets

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the task of unsupervised constituency parsing from bilingual parallel corpora. Our goal is to use bilingual cues to learn improved parsing models for each language and to evaluate these models on held-out monolingual test data. We formulate a generative Bayesian model which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters. To this end, we adapt a formalism known as unordered tree alignment to our probabilistic setting. Using this formalism, our model loosely binds parallel trees while allowing language-specific syntactic structure. We perform inference under this model using Markov Chain Monte Carlo and dynamic programming. Applying this model to three parallel corpora (Korean-English, Urdu-English, and Chinese-English) we find substantial performance gains over the CCM model, a strong monolingual baseline. On average, across a variety of testing scenarios, our model achieves an 8.8 absolute gain in F-measure.