Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
K-means clustering via principal component analysis
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Cross-language text classification
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Orthogonal nonnegative matrix t-factorizations for clustering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Can chinese web pages be classified with english data source?
Proceedings of the 17th international conference on World Wide Web
Collaborative filtering using orthogonal nonnegative matrix tri-factorization
Information Processing and Management: an International Journal
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Knowledge transformation for cross-domain sentiment classification
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Domain adaptation with structural correspondence learning
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Convex and Semi-Nonnegative Matrix Factorizations
IEEE Transactions on Pattern Analysis and Machine Intelligence
Co-training for cross-lingual sentiment classification
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A refinement framework for cross language text categorization
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
IEEE Transactions on Knowledge and Data Engineering
Cross-language text classification using structural correspondence learning
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Cross language text classification by model translation and semi-supervised learning
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Using information from the target language to improve crosslingual text classification
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Proceedings of the 20th ACM international conference on Information and knowledge management
Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On automatically tagging web documents from examples
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Triplex transfer learning: exploiting both shared and distinct concepts for text classification
Proceedings of the sixth ACM international conference on Web search and data mining
Cross-lingual web spam classification
Proceedings of the 22nd international conference on World Wide Web companion
Concept learning for cross-domain text classification: a general probabilistic framework
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Personal and Ubiquitous Computing
Hi-index | 0.00 |
The lack of sufficient labeled Web pages in many languages, especially for those uncommonly used ones, presents a great challenge to traditional supervised classification methods to achieve satisfactory Web page classification performance. To address this, we propose a novel Nonnegative Matrix Tri-factorization (NMTF) based Dual Knowledge Transfer (DKT) approach for cross-language Web page classification, which is based on the following two important observations. First, we observe that Web pages for a same topic from different languages usually share some common semantic patterns, though in different representation forms. Second, we also observe that the associations between word clusters and Web page classes are a more reliable carrier than raw words to transfer knowledge across languages. With these recognitions, we attempt to transfer knowledge from the auxiliary language, in which abundant labeled Web pages are available, to target languages, in which we want classify Web pages, through two different paths: word cluster approximations and the associations between word clusters and Web page classes. Due to the reinforcement between these two different knowledge transfer paths, our approach can achieve better classification accuracy. We evaluate the proposed approach in extensive experiments using a real world cross-language Web page data set. Promising results demonstrate the effectiveness of our approach that is consistent with our theoretical analyses.