Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation

  • Authors:
  • Helena M. Caseli;Maria Das Nunes;Mikel L. Forcada

  • Affiliations:
  • NILC --- ICMC, University of São Paulo, São Carlos, Brazil;NILC --- ICMC, University of São Paulo, São Carlos, Brazil;Departament de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, Alacant, Spain 03071

  • Venue:
  • Machine Translation
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual single-word and multi-word correspondences, translation rules) demands extensive manual work, and, as a consequence, bilingual resources are usually more difficult to find than "shallow" monolingual resources such as morphological dictionaries or part-of-speech taggers, especially when they involve a less-resourced language. This paper describes a methodology to build automatically both bilingual dictionaries and shallow-transfer rules by extracting knowledge from word-aligned parallel corpora processed with shallow monolingual resources (morphological analysers, and part-of-speech taggers). We present experiments for Brazilian Portuguese---Spanish and Brazilian Portuguese---English parallel texts. The results show that the proposed methodology can enable the rapid creation of valuable computational resources (bilingual dictionaries and shallow-transfer rules) for machine translation and other natural language processing tasks).