Bilingual Text Classification

Authors:
Jorge Civera;Elsa Cubel;Enrique Vidal
Affiliations:
Instituto Tecnológico de Informática, Universidad Politécnica de Valencia,;Instituto Tecnológico de Informática, Universidad Politécnica de Valencia,;Instituto Tecnológico de Informática, Universidad Politécnica de Valencia,
Venue:
IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Year:
2007

Citing 11
Cited 1

Statistical methods for speech recognition

Statistical methods for speech recognition
The EuTrans Spoken Language Translation System

Machine Translation
Defense of the ansatz for dynamical hierarchies

Artificial Life
Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improve the Learning of Subsequential Transducers by Using Alignments and Dictionaries

ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
Using domain information during the learning of a subsequential transducer

ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
Translation with Finite-State Devices

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Bilingual Text Classification

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I

Bilingual Text Classification

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bilingual documentation has become a common phenomenon in official institutions and private companies. In this scenario, the categorization of bilingual text is a useful tool. In this paper, different approaches will be proposed to tackle this bilingual classification task. On the one hand, three finite-state transducer algorithms from the grammatical inference framework will be presented. On the other hand, a naive combination of smoothed n-gram models will be introduced. To evaluate the performance of bilingual classifiers, two categorized bilingual corpora of different complexity were considered. Experiments in a limited-domain task show that all the models obtain similar results. However, results on a more open-domain task denote the supremacy of the naive approach.