Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples
The Journal of Machine Learning Research
Self-taught learning: transfer learning from unlabeled data
Proceedings of the 24th international conference on Machine learning
Cross-domain video concept detection using adaptive svms
Proceedings of the 15th international conference on Multimedia
Extracting shared subspace for multi-label classification
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Spectral domain-transfer learning
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Domain adaptation with structural correspondence learning
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Transfer learning via dimensionality reduction
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Mining employment market via text block detection and adaptive cross-domain information extraction
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Cross-domain sentiment classification via spectral feature alignment
Proceedings of the 19th international conference on World wide web
Semi-supervised projection clustering with transferred centroid regularization
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Domain adaptation for text categorization by feature labeling
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Multi-task clustering via domain adaptation
Pattern Recognition
Semi-supervised multi-task learning of structured prediction models for web information extraction
Proceedings of the 20th ACM international conference on Information and knowledge management
A cross-domain adaptation method for sentiment classification using probabilistic latent analysis
Proceedings of the 20th ACM international conference on Information and knowledge management
Sentence-level instance-weighting for graph-based and transition-based dependency parsing
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
On minimum distribution discrepancy support vector machine for domain adaptation
Pattern Recognition
Bi-weighting domain adaptation for cross-language text classification
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Linear semi-supervised projection clustering by transferred centroid regularization
Journal of Intelligent Information Systems
Transfer joint embedding for cross-domain named entity recognition
ACM Transactions on Information Systems (TOIS)
Proceedings of the 2013 International Conference on Software Engineering
Discriminative feature selection for multi-view cross-domain learning
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributions. This poses a great difficulty for many statistical learning methods. However, when the distribution in the source domain and the target domain are not identical but related, there may exist a shared concept space to preserve the relation. Consequently a good feature representation can encode this concept space and minimize the distribution gap. To formalize this intuition, we propose a domain adaptation method that parameterizes this concept space by linear transformation under which we explicitly minimize the distribution difference between the source domain with sufficient labeled data and target domains with only unlabeled data, while at the same time minimizing the empirical loss on the labeled data in the source domain. Another characteristic of our method is its capability for considering multiple classes and their interactions simultaneously. We have conducted extensive experiments on two common text mining problems, namely, information extraction and document classification to demonstrate the effectiveness of our proposed method.