Aligning portuguese and chinese parallel texts using confidence bands

  • Authors:
  • António Ribeiro;Gabriel Lopes;João Mexia

  • Affiliations:
  • Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, Departamento de Informática, Monte da Caparica, Portugal;Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, Departamento de Informática, Monte da Caparica, Portugal;Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, Departamento de Informática, Monte da Caparica, Portugal

  • Venue:
  • PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a language independent method that makes use of tokens which are homograph for a pair of languages, in order to align parallel texts. We will show that even for such different languages as Portuguese and Chinese it is possible to use homographs with great reliability. This work was originally inspired and extends work done by Pascale Fung & Kathleen McKeown, and Melamed. In order to filter out words that may cause misalignment, we use confidence bands of linear regression lines instead of statistically unsupported heuristics. This is a completely statistically supported alignment algorithm.