Extracting discriminative concepts for domain adaptation in text mining

Authors:
Bo Chen;Wai Lam;Ivor Tsang;Tak-Lam Wong
Affiliations:
The Chinese Unversity of Hong kong, Hong Kong, Hong Kong;The Chinese Unversity of Hong kong, Hong Kong, Hong Kong;Nanyang Technological University, Singapore, Singapore;The Chinese Unverisity of Hong Kong, Hong Kong, Hong Kong
Venue:
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2009

Citing 8
Cited 14

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
Self-taught learning: transfer learning from unlabeled data

Proceedings of the 24th international conference on Machine learning
Cross-domain video concept detection using adaptive svms

Proceedings of the 15th international conference on Multimedia
Extracting shared subspace for multi-label classification

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Spectral domain-transfer learning

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Transfer learning via dimensionality reduction

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2

Mining employment market via text block detection and adaptive cross-domain information extraction

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Cross-domain sentiment classification via spectral feature alignment

Proceedings of the 19th international conference on World wide web
Semi-supervised projection clustering with transferred centroid regularization

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Domain adaptation for text categorization by feature labeling

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Multi-task clustering via domain adaptation

Pattern Recognition
Semi-supervised multi-task learning of structured prediction models for web information extraction

Proceedings of the 20th ACM international conference on Information and knowledge management
A cross-domain adaptation method for sentiment classification using probabilistic latent analysis

Proceedings of the 20th ACM international conference on Information and knowledge management
Sentence-level instance-weighting for graph-based and transition-based dependency parsing

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
On minimum distribution discrepancy support vector machine for domain adaptation

Pattern Recognition
Bi-weighting domain adaptation for cross-language text classification

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Linear semi-supervised projection clustering by transferred centroid regularization

Journal of Intelligent Information Systems
Transfer joint embedding for cross-domain named entity recognition

ACM Transactions on Information Systems (TOIS)
Transfer defect learning

Proceedings of the 2013 International Conference on Software Engineering
Discriminative feature selection for multi-view cross-domain learning

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributions. This poses a great difficulty for many statistical learning methods. However, when the distribution in the source domain and the target domain are not identical but related, there may exist a shared concept space to preserve the relation. Consequently a good feature representation can encode this concept space and minimize the distribution gap. To formalize this intuition, we propose a domain adaptation method that parameterizes this concept space by linear transformation under which we explicitly minimize the distribution difference between the source domain with sufficient labeled data and target domains with only unlabeled data, while at the same time minimizing the empirical loss on the labeled data in the source domain. Another characteristic of our method is its capability for considering multiple classes and their interactions simultaneously. We have conducted extensive experiments on two common text mining problems, namely, information extraction and document classification to demonstrate the effectiveness of our proposed method.