Aligning Multiword Terms Using a Hybrid Approach

  • Authors:
  • Arantza Casillas;Raquel Martínez

  • Affiliations:
  • -;-

  • Venue:
  • CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the context of parallel corpus alignment research between a pair of languages with various and important distinguishing factors (e.g., structural, lexical, morpho-syntactical), this paper presents an approach that deals with multiword terms alignment. Our system, ALINTEC, implements a hybrid strategy that adds various kinds of linguistic knowledge (an aligned corpus at the sentence level, POS tagging, grammatical patterns, and a bilingual glossary) to quantitative criteria such as frequency and distribution of terms in the corpus. The experiments were undertaken on a parallel corpus consisting on a collection of administrative and legal documents in Spanish and Basque. This pair of languages is representative of the context in which our work is framed. The results show that our approach obtains reasonably good results in aligning terms of a pair of languages of different typology such as Spanish and Basque.