Statistical Machine Translation of Broadcast News from Spanish to Portuguese

  • Authors:
  • Raquel Sánchez Martínez;João Paulo Silva Neto;Diamantino António Caseiro

  • Affiliations:
  • L2F - Spoken Language Systems Laboratory, INESC ID Lisboa, Lisboa, Portugal 1000-029;L2F - Spoken Language Systems Laboratory, INESC ID Lisboa, Lisboa, Portugal 1000-029;L2F - Spoken Language Systems Laboratory, INESC ID Lisboa, Lisboa, Portugal 1000-029

  • Venue:
  • PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe the work carried out to develop an automatic system for translation of broadcast news from Spanish to Portuguese. Two challenging topics of speech and language processing were involved: Automatic Speech Recognition (ASR) of the Spanish News and Statistical Machine Translation (SMT) of the results to the Portuguese language. ASR of broadcast news is based on the AUDIMUS.MEDIA system, a hybrid ANN/HMM system with multiple stream decoding. A 22.08% Word Error Rate (WER) was achieved in a Spanish Broadcast News task, which is comparable to other international state of the art systems. Parallel normalized texts from European Parliament database were used to train the SMT system from Spanish to Portuguese. Preliminary non-exhaustive human evaluation showed a fluency of 3.74 and sufficiency of 4.23.