Spanish-basque parallel corpus structure: linguistic annotations and translation units

  • Authors:
  • A. Casillas;A. Díaz de Illarraza;J. Igartua;R. Martínez;K. Sarasola;A. Sologaistoa

  • Affiliations:
  • Dpt. Electricidad y Electrónica, UPV-EHU;IXA Taldea;IXA Taldea;NLP&IR Group, UNED;IXA Taldea;IXA Taldea

  • Venue:
  • TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a corpus structure which represents and manages an aligned parallel corpus. The corpus structure is based on a stand-off annotation model, which is composed of several XML documents. A bilingual parallel corpus represented in the proposed structure will contain: (1) the entire corpus together with its corresponding linguistic information, (2) translation units and alignment relations between units of the two languages: paragraphs, sentences and named entities. The proposed structure permits to work with the corpus both as an annotated corpus with linguistic information, and as a translation memory.