Optimization of word alignment clues

Authors:
Jörg Tiedemann
Affiliations:
Alfa-Informatica, University of Groningen, Groningen, The Netherlands e-mail: tiedeman@let.rug.nl
Venue:
Natural Language Engineering
Year:
2005

Citing 11
Cited 2

Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Machine Learning

Machine Learning
Shallow parsing with pos taggers and linguistic features

The Journal of Machine Learning Research
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Termight: identifying and translating technical terminology

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Dilemma: an instant lexicographer

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Combining clues for word alignment

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Inducing multilingual text analysis tools via robust projection across aligned corpora

HLT '01 Proceedings of the first international conference on Human language technology research
An unsupervised method for word sense tagging using parallel corpora

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

Multi-objective optimisation of real-valued parameters of a hybrid MT system using Genetic Algorithms

Pattern Recognition Letters
Building and using comparable corpora for domain-specific bilingual lexicon extraction

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical, linguistic, and heuristic clues can be used for the alignment of words and multi-word units in parallel texts. This article describes the clue alignment approach and the optimization of its parameters using a genetic algorithm. Word alignment clues can come from various sources such as statistical alignment models, co-occurrence tests, string similarity scores and static dictionaries. A genetic algorithm implementing an evolutionary procedure can be used to optimize the parameters necessary for combining available clues. Experiments on English/Swedish bitext show a significant improvement of about 6% in F-scores compared to the baseline produced by statistical word alignment.