Different approaches to bilingual text classification based on grammatical inference techniques

Authors:
Jorge Civera;Elsa Cubel;Alfons Juan;Enrique Vidal
Affiliations:
Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia;Instituto Tecnológico de Informática, Universidad Politécnica de Valencia;Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia;Instituto Tecnológico de Informática, Universidad Politécnica de Valencia
Venue:
IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Year:
2005

Citing 12
Cited 0

An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
The EuTrans Spoken Language Translation System

Machine Translation
Defense of the ansatz for dynamical hierarchies

Artificial Life
Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Using domain information during the learning of a subsequential transducer

ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
On the use of Bernoulli Mixture Models for Text Classification

PRIS '01 Proceedings of the 1st International Workshop on Pattern Recognition in Information Systems: In conjunction with ICEIS 2001
Translation with Finite-State Devices

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Finite-State Speech-to-Speech Translation

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.01

Visualization

Abstract

Bilingual documentation has become a common phenomenon in many official institutions and private companies. In this scenario, the categorization of bilingual text is a useful tool, that can be also applied in the machine translation field. To tackle this classification task, different approaches will be proposed. On the one hand, two finite-state transducer algorithms from the grammatical inference domain will be discussed. On the other hand, the well-known naive Bayes approximation will be presented along with a possible modelization based on n-gram language models. Experiments carried out on a bilingual corpus have demonstrated the adequacy of these methods and the relevance of a second information source in text classification, as supported by classification error rates. Relative reduction of 29% with respect to the best previous results on the monolingual version of the same task has been obtained.