Parsing word-aligned parallel corpora in a grammar induction context

Authors:
Jonas Kuhn
Affiliations:
The University of Texas at Austin
Venue:
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Year:
2005

Citing 12
Cited 0

Interleaving natural language parsing and generation through uniform processing

Artificial Intelligence
Syntax-Directed Transduction

Journal of the ACM (JACM)
An efficient context-free parsing algorithm

Communications of the ACM
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
Chart generation

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A uniform architecture for parsing and generation

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 2
Learning Chinese bracketing knowledge based on a bilingual language model

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Multitext Grammars and synchronous parsers

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Loosely tree-based alignment for machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Phrasal cohesion and statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Experiments in parallel-text based grammar induction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Statistical machine translation by parsing

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an Earley-style dynamic programming algorithm for parsing sentence pairs from a parallel corpus simultaneously, building up two phrase structure trees and a correspondence mapping between the nodes. The intended use of the algorithm is in bootstrapping grammars for less studied languages by using implicit grammatical information in parallel corpora. Therefore, we presuppose a given (statistical) word alignment underlying in the synchronous parsing task; this leads to a significant reduction of the parsing complexity. The theoretical complexity results are corroborated by a quantitative evaluation in which we ran an implementation of the algorithm on a suite of test sentences from the Europarl parallel corpus.