Bilingual Text Classification

  • Authors:
  • Jorge Civera;Elsa Cubel;Enrique Vidal

  • Affiliations:
  • Instituto Tecnológico de Informática, Universidad Politécnica de Valencia,;Instituto Tecnológico de Informática, Universidad Politécnica de Valencia,;Instituto Tecnológico de Informática, Universidad Politécnica de Valencia,

  • Venue:
  • IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Bilingual documentation has become a common phenomenon in official institutions and private companies. In this scenario, the categorization of bilingual text is a useful tool. In this paper, different approaches will be proposed to tackle this bilingual classification task. On the one hand, three finite-state transducer algorithms from the grammatical inference framework will be presented. On the other hand, a naive combination of smoothed n-gram models will be introduced. To evaluate the performance of bilingual classifiers, two categorized bilingual corpora of different complexity were considered. Experiments in a limited-domain task show that all the models obtain similar results. However, results on a more open-domain task denote the supremacy of the naive approach.