Iterative rule segmentation under minimum description length for unsupervised transduction grammar induction

Authors:
Markus Saers;Karteek Addanki;Dekai Wu
Affiliations:
Human Language Technology Center Dept. of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong;Human Language Technology Center Dept. of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong;Human Language Technology Center Dept. of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong
Venue:
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Year:
2013

Citing 25
Cited 0

Inducing Probabilistic Grammars by Bayesian Model Merging

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
A minimum description length approach to grammar inference

Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Programming languages and their compilers: Preliminary notes

Programming languages and their compilers: Preliminary notes
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Scalable inference and training of context-rich syntactic translation models

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Hierarchical Phrase-Based Translation

Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Inversion transduction grammar for joint phrasal translation modeling

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Improving phrase-based translation via word alignments from stochastic inversion transduction grammars

SSST '09 Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation
A Gibbs sampler for phrasal synchronous grammar induction

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Better word alignments with supervised ITG models

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Learning stochastic bracketing inversion transduction grammars with a cubic time biparsing algorithm

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Joint parsing and alignment with weakly synchronized grammars

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Inducing synchronous grammars with slice sampling

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Word alignment with Stochastic Bracketing Linear Inversion Transduction Grammar

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Hierarchical search for word alignment

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
An unsupervised model for joint phrase alignment and extraction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised learning of event AND-OR grammar and semantics from video

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Machine translation without words through substring alignment

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We argue that for purely incremental unsupervised learning of phrasal inversion transduction grammars, a minimum description length driven, iterative top-down rule segmentation approach that is the polar opposite of Saers, Addanki, and Wu's previous 2012 bottom-up iterative rule chunking model yields significantly better translation accuracy and grammar parsimony. We still aim for unsupervised bilingual grammar induction such that training and testing are optimized upon the same exact underlying model--a basic principle of machine learning and statistical prediction that has become unduly ignored in statistical machine translation models of late, where most decoders are badly mismatched to the training assumptions. Our novel approach learns phrasal translations by recursively subsegmenting the training corpus, as opposed to our previous model--where we start with a token-based transduction grammar and iteratively build larger chunks. Moreover, the rule segmentation decisions in our approach are driven by a minimum description length objective, whereas the rule chunking decisions were driven by a maximum likelihood objective. We demonstrate empirically how this trades off maximum likelihood against model size, aiming for a more parsimonious grammar that escapes the perfect overfitting to the training data that we start out with, and gradually generalizes to previously unseen sentence translations so long as the model shrinks enough to warrant a looser fit to the training data. Experimental results show that our approach produces a significantly smaller and better model than the chunking-based approach.