Domain adaptation for text categorization by feature labeling

Authors:
Cristina Kadar;José Iria
Affiliations:
IBM Research Zurich, Rüschlikon, Switzerland;IBM Research Zurich, Rüschlikon, Switzerland
Venue:
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Year:
2011

Citing 15
Cited 0

Incorporating Prior Knowledge into Boosting

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Exploring in the weblog space by detecting informative and affective articles

Proceedings of the 16th international conference on World Wide Web
Learning from labeled features using generalized expectation criteria

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Topic-bridged PLSA for cross-domain text classification

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Extracting discriminative concepts for domain adaptation in text mining

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Named entity mining from click-through data using weakly supervised latent dirichlet allocation

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Transfer learning via dimensionality reduction

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Domain adaptation with latent semantic association for named entity recognition

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Hierarchical Bayesian domain adaptation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Latent Dirichlet Allocation with topic-in-set knowledge

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Domain adaptation via transfer component analysis

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel approach to domain adaptation for text categorization, which merely requires that the source domain data are weakly annotated in the form of labeled features. The main advantage of our approach resides in the fact that labeling words is less expensive than labeling documents. We propose two methods, the first of which seeks to minimize the divergence between the distributions of the source domain, which contains labeled features, and the target domain, which contains only unlabeled data. The second method augments the labeled features set in an unsupervised way, via the discovery of a shared latent concept space between source and target. We empirically show that our approach outperforms standard supervised and semi-supervised methods, and obtains results competitive to those reported by state-of-the-art domain adaptation methods, while requiring considerably less supervision.