Using target-language information to train part-of-speech taggers for machine translation

Authors:
Felipe Sánchez-Martínez;Juan Antonio Pérez-Ortiz;Mikel L. Forcada
Affiliations:
Dept. de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, Alacant, Spain 03071;Dept. de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, Alacant, Spain 03071;Dept. de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, Alacant, Spain 03071
Venue:
Machine Translation
Year:
2008

Citing 17
Cited 3

A framework of a mechanical translation between Japanese and English by analogy principle

Proc. of the international NATO symposium on Artificial and human intelligence
Poor estimates of context are worse than none

HLT '90 Proceedings of the workshop on Speech and Natural Language
Automatic stochastic tagging of natural language texts

Computational Linguistics
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Statistical methods for speech recognition

Statistical methods for speech recognition
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Target-Text Mediated Interactive Machine Translation

Machine Translation
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model

Computational Linguistics
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Improving part-of-speech tagging using lexicalized HMMs

Natural Language Engineering
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
POS-tagger for English-Vietnamese bilingual corpus

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Speeding up target-language driven part-of-speech tagger training for machine translation

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Open-Source portuguese–spanish machine translation

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language

Inferring shallow-transfer machine translation rules from small parallel corpora

Journal of Artificial Intelligence Research
Apertium: a free/open-source platform for rule-based machine translation

Machine Translation
Nitin Indurkhya and Fred J. Damerau (eds): Handbook of Natural Language Processing (second edition)

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although corpus-based approaches to machine translation (MT) are growing in interest, they are not applicable when the translation involves less-resourced language pairs for which there are no parallel corpora available; in those cases, the rule-based approach is the only applicable solution. Most rule-based MT systems make use of part-of-speech (PoS) taggers to solve the PoS ambiguities in the source-language texts to translate; those MT systems require accurate PoS taggers to produce reliable translations in the target language (TL). The standard statistical approach to PoS ambiguity resolution (or tagging) uses hidden Markov models (HMM) trained in a supervised way from hand-tagged corpora, an expensive resource not always available, or in an unsupervised way through the Baum-Welch expectation-maximization algorithm; both methods use information only from the language being tagged. However, when tagging is considered as an intermediate task for the translation procedure, that is, when the PoS tagger is to be embedded as a module within an MT system, information from the TL can be (unsupervisedly) used in the training phase to increase the translation quality of the whole MT system. This paper presents a method to train HMM-based PoS taggers to be used in MT; the new method uses not only information from the source language (SL), as general-purpose methods do, but also information from the TL and from the remaining modules of the MT system in which the PoS tagger is to be embedded. We find that the translation quality of the MT system embedding a PoS tagger trained in an unsupervised manner through this new method is clearly better than that of the same MT system embedding a PoS tagger trained through the Baum-Welch algorithm, and comparable to that obtained by embedding a PoS tagger trained in a supervised way from hand-tagged corpora.