Different approaches to bilingual text classification based on grammatical inference techniques

  • Authors:
  • Jorge Civera;Elsa Cubel;Alfons Juan;Enrique Vidal

  • Affiliations:
  • Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia;Instituto Tecnológico de Informática, Universidad Politécnica de Valencia;Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia;Instituto Tecnológico de Informática, Universidad Politécnica de Valencia

  • Venue:
  • IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

Bilingual documentation has become a common phenomenon in many official institutions and private companies. In this scenario, the categorization of bilingual text is a useful tool, that can be also applied in the machine translation field. To tackle this classification task, different approaches will be proposed. On the one hand, two finite-state transducer algorithms from the grammatical inference domain will be discussed. On the other hand, the well-known naive Bayes approximation will be presented along with a possible modelization based on n-gram language models. Experiments carried out on a bilingual corpus have demonstrated the adequacy of these methods and the relevance of a second information source in text classification, as supported by classification error rates. Relative reduction of 29% with respect to the best previous results on the monolingual version of the same task has been obtained.