Cross-Lingual Adaptation Using Structural Correspondence Learning

Authors:
Peter Prettenhofer;Benno Stein
Affiliations:
Bauhaus-Universität Weimar;Bauhaus-Universität Weimar
Venue:
ACM Transactions on Intelligent Systems and Technology (TIST)
Year:
2011

Citing 32
Cited 2

Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Cross-lingual relevance models

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Inducing information extraction systems for new languages via cross-language projection

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Cross-language text classification

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An EM Based Training Algorithm for Cross-Language Text Categorization

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Deeper sentiment analysis using machine translation technology

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Advanced learning algorithms for cross-language patent retrieval and classification

Information Processing and Management: an International Journal
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
A two-stage approach to domain adaptation for statistical classifiers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Can chinese web pages be classified with english data source?

Proceedings of the 17th international conference on World Wide Web
Efficient projections onto the l1-ball for learning in high dimensions

Proceedings of the 25th international conference on Machine learning
Sample Selection Bias Correction Theory

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Dataset Shift in Machine Learning

Dataset Shift in Machine Learning
Sparse Online Learning via Truncated Gradient

The Journal of Machine Learning Research
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Hierarchical Bayesian domain adaptation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Cross language text categorization by acquiring multilingual domain models from comparable corpora

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Co-training for cross-lingual sentiment classification

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Multi-class confidence weighted algorithms

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Discriminative Learning Under Covariate Shift

The Journal of Machine Learning Research
A Wikipedia-based multilingual retrieval model

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Cross-language text classification using structural correspondence learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Cross lingual adaptation: an experiment on sentiment classifications

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Domain adaptation with unlabeled data for dialog act tagging

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

Source-selection-free transfer learning

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Cross-lingual mixture model for sentiment classification

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cross-lingual adaptation is a special case of domain adaptation and refers to the transfer of classification knowledge between two languages. In this article we describe an extension of Structural Correspondence Learning (SCL), a recently proposed algorithm for domain adaptation, for cross-lingual adaptation in the context of text classification. The proposed method uses unlabeled documents from both languages, along with a word translation oracle, to induce a cross-lingual representation that enables the transfer of classification knowledge from the source to the target language. The main advantages of this method over existing methods are resource efficiency and task specificity. We conduct experiments in the area of cross-language topic and sentiment classification involving English as source language and German, French, and Japanese as target languages. The results show a significant improvement of the proposed method over a machine translation baseline, reducing the relative error due to cross-lingual adaptation by an average of 30% (topic classification) and 59% (sentiment classification). We further report on empirical analyses that reveal insights into the use of unlabeled data, the sensitivity with respect to important hyperparameters, and the nature of the induced cross-lingual word correspondences.