Multilingual aligned parallel treebank corpus reflecting contextual information and its applications

  • Authors:
  • Kiyotaka Uchimoto;Yujie Zhang;Kiyoshi Sudo;Masaki Murata;Satoshi Sekine;Hitoshi Isahara

  • Affiliations:
  • National Institute of Information and Communications Technology, Hikari-dai, Seika-cho, Soraku-gun, Kyoto, Japan;National Institute of Information and Communications Technology, Hikari-dai, Seika-cho, Soraku-gun, Kyoto, Japan;New York University, New York, NY;National Institute of Information and Communications Technology, Hikari-dai, Seika-cho, Soraku-gun, Kyoto, Japan;New York University, New York, NY;National Institute of Information and Communications Technology, Hikari-dai, Seika-cho, Soraku-gun, Kyoto, Japan

  • Venue:
  • MLR '04 Proceedings of the Workshop on Multilingual Linguistic Ressources
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes Japanese-English-Chinese aligned parallel treebank corpora of newspaper articles. They have been constructed by translating each sentence in the Penn Treebank and the Kyoto University text corpus into a corresponding natural sentence in a target language. Each sentence is translated so as to reflect its contextual information and is annotated with morphological and syntactic structures and phrasal alignment. This paper also describes the possible applications of the parallel corpus and proposes a new framework to aid in translation. In this framework, parallel translations whose source language sentence is similar to a given sentence can be semi-automatically generated. In this paper we show that the framework can be achieved by using our aligned parallel treebank corpus.