A chunk-driven bootstrapping approach to extracting translation patterns

Authors:
Lieve Macken;Walter Daelemans
Affiliations:
LT3, University College Ghent, Ghent, Belgium;CLiPS Computational Linguistics Group, University of Antwerp, Antwerpen, Belgium
Venue:
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2010

Citing 11
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
A systematic comparison of various statistical alignment models

Computational Linguistics
Stone soup translation: the linked automata model

Stone soup translation: the linked automata model
Models of translational equivalence among words

Computational Linguistics
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Combining clues for word alignment

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Memory-Based Language Processing (Studies in Natural Language Processing)

Memory-Based Language Processing (Studies in Natural Language Processing)
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Linguistically-based sub-sentential alignment for terminology extraction from a bilingual automotive corpus

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Association-based bilingual word alignment

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a linguistically-motivated sub-sentential alignment system that extends the intersected IBM Model 4 word alignments. The alignment system is chunk-driven and requires only shallow linguistic processing tools for the source and the target languages, i.e. part-of-speech taggers and chunkers. We conceive the sub-sentential aligner as a cascaded model consisting of two phases. In the first phase, anchor chunks are linked based on the intersected word alignments and syntactic similarity. In the second phase, we use a bootstrapping approach to extract more complex translation patterns. The results show an overall AER reduction and competitive F-Measures in comparison to the commonly used symmetrized IBM Model 4 predictions (intersection, union and grow-diag-final) on six different text types for English-Dutch. More in particular, in comparison with the intersected word alignments, the proposed method improves recall, without sacrificing precision. Moreover, the system is able to align discontiguous chunks, which frequently occur in Dutch.