Towards the improvement of statistical translation models using linguistic features

Authors:
Alicia Pérez;Inés Torres;Francisco Casacuberta
Affiliations:
Departamento de Electricidad y Electrónica, Facultad de Ciencia y Tecnología, Universidad del País Vasco;Departamento de Electricidad y Electrónica, Facultad de Ciencia y Tecnología, Universidad del País Vasco;Departamento de Sistemas Informáticos y Computación, Institut Tecnològic d’Informàtica, Universidad Politécnica de Valencia
Venue:
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Year:
2006

Citing 5
Cited 1

Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Computational Complexity of Problems on Probabilistic Grammars and Transducers

ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Machine Translation with Inferred Stochastic Finite-State Transducers

Computational Linguistics

Joining linguistic and statistical methods for Spanish-to-Basque speech translation

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical translation models can be inferred from bilingual samples whenever enough training data are available. However, bilingual corpora are usually too scarce resources so as to get reliable statistical models, particularly, when we are dealing with very inflected languages, or with agglutinative languages, where many words appear just once. Such events often distort the statistics. In order to cope with this problem, we have turned to morphological knowledge. Instead of dealing directly with running words, we also take advantage of lemmas, thus, producing the translation in two stages. In the first stage we transform the source sentence into a lemmatized target sentence, and in the second stage we convert the lemmatized target sentence into the target full forms.