Cross-language text classification using structural correspondence learning

Authors:
Peter Prettenhofer;Benno Stein
Affiliations:
Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany
Venue:
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Year:
2010

Citing 20
Cited 12

Cross-lingual relevance models

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Cross-language text classification

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An EM Based Training Algorithm for Cross-Language Text Categorization

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Deeper sentiment analysis using machine translation technology

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Advanced learning algorithms for cross-language patent retrieval and classification

Information Processing and Management: an International Journal
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
A two-stage approach to domain adaptation for statistical classifiers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Can chinese web pages be classified with english data source?

Proceedings of the 17th international conference on World Wide Web
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Hierarchical Bayesian domain adaptation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Cross language text categorization by acquiring multilingual domain models from comparable corpora

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Co-training for cross-lingual sentiment classification

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A Wikipedia-based multilingual retrieval model

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

Joint bilingual sentiment classification with unlabeled parallel corpora

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Is machine translation ripe for cross-lingual sentiment classification?

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Cross-Lingual Adaptation Using Structural Correspondence Learning

ACM Transactions on Intelligent Systems and Technology (TIST)
Language-independent sentiment classification using three common words

Proceedings of the 20th ACM international conference on Information and knowledge management
Bilingual co-training for sentiment classification of chinese product reviews

Computational Linguistics
Cross-lingual text classification with model translation and document translation

Proceedings of the 50th Annual Southeast Regional Conference
Cross-Language Latent Relational Search between Japanese and English Languages Using a Web Corpus

ACM Transactions on Asian Language Information Processing (TALIP)
Cross-lingual genre classification

EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Analyzing Urdu social media for sentiments using transfer learning with controlled translations

LSM '12 Proceedings of the Second Workshop on Language in Social Media
A Fast and Accurate Method for Bilingual Opinion Lexicon Extraction

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Cross-lingual web spam classification

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation. The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences. We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of inter-language correspondence modeling. We conduct experiments in the field of cross-language sentiment classification, employing English as source language, and German, French, and Japanese as target languages. The results are convincing; they demonstrate both the robustness and the competitiveness of the presented ideas.