Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization

Authors:
Alfio Gliozzo;Carlo Strapparava
Affiliations:
ITC-Irst, Trento, Italy;ITC-Irst, Trento, Italy
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 7
Cited 16

Generalized vector spaces model in information retrieval

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
An EM Based Training Algorithm for Cross-Language Text Categorization

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Learning a translation lexicon from monolingual corpora

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Statistical machine translation with word- and sentence-aligned parallel corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A geometric view on bilingual lexicon extraction from comparable corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Cross language text categorization by acquiring multilingual domain models from comparable corpora

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

Finding translations for low-frequency words in comparable corpora

Machine Translation
Bilingual topic aspect classification with a few training examples

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Cross-lingual query classification: a preliminary study

Proceedings of the 2nd ACM workshop on Improving non english web searching
Cross-language query classification using web search for exogenous knowledge

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Using Nearest Neighbor Information to Improve Cross-Language Text Classification

MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
Cross-lingual semantic relatedness using encyclopedic knowledge

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
A refinement framework for cross language text categorization

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Cross-language text classification using structural correspondence learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Cross-lingual latent topic extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Cross language text classification by model translation and semi-supervised learning

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Cross lingual text classification by mining multilingual topics from wikipedia

Proceedings of the fourth ACM international conference on Web search and data mining
Cross-lingual text categorization: Conquering language boundaries in globalized environments

Information Processing and Management: an International Journal
Knowledge transfer across multilingual corpora via latent topics

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Cross-Lingual Adaptation Using Structural Correspondence Learning

ACM Transactions on Intelligent Systems and Technology (TIST)
Cross-lingual genre classification

EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Exploiting poly-lingual documents for improving text categorization effectiveness

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cross-language Text Categorization is the task of assigning semantic classes to documents written in a target language (e.g. English) while the system is trained using labeled documents in a source language (e.g. Italian).In this work we present many solutions according to the availability of bilingual resources, and we show that it is possible to deal with the problem even when no such resources are accessible. The core technique relies on the automatic acquisition of Multilingual Domain Models from comparable corpora.Experiments show the effectiveness of our approach, providing a low cost solution for the Cross Language Text Categorization task. In particular, when bilingual dictionaries are available the performance of the categorization gets close to that of monolingual text categorization.