Speeding up target-language driven part-of-speech tagger training for machine translation

Authors:
Felipe Sánchez-Martínez;Juan Antonio Pérez-Ortiz;Mikel L. Forcada
Affiliations:
Transducens Group – Departament de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, Alacant, Spain;Transducens Group – Departament de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, Alacant, Spain;Transducens Group – Departament de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, Alacant, Spain
Venue:
MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Year:
2006

Citing 4
Cited 1

Poor estimates of context are worse than none

HLT '90 Proceedings of the workshop on Speech and Natural Language
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Open-Source portuguese–spanish machine translation

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language

Using target-language information to train part-of-speech taggers for machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

When training hidden-Markov-model-based part-of-speech (PoS) taggers involved in machine translation systems in an unsupervised manner the use of target-language information has proven to give better results than the standard Baum-Welch algorithm. The target-language-driven training algorithm proceeds by translating every possible PoS tag sequence resulting from the disambiguation of the words in each source-language text segment into the target language, and using a target-language model to estimate the likelihood of the translation of each possible disambiguation. The main disadvantage of this method is that the number of translations to perform grows exponentially with segment length, translation being the most time-consuming task. In this paper, we present a method that uses a priori knowledge obtained in an unsupervised manner to prune unlikely disambiguations in each text segment, so that the number of translations to be performed during training is reduced. The experimental results show that this new pruning method drastically reduces the amount of translations done during training (and, consequently, the time complexity of the algorithm) without degrading the tagging accuracy achieved.