Elements of information theory
Elements of information theory
A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Representation and learning in information retrieval
Representation and learning in information retrieval
C4.5: programs for machine learning
C4.5: programs for machine learning
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Enhanced word clustering for hierarchical text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A generalized maximum entropy approach to bregman co-clustering and matrix approximation
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Cross-language text classification
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An EM Based Training Algorithm for Cross-Language Text Categorization
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Advanced learning algorithms for cross-language patent retrieval and classification
Information Processing and Management: an International Journal
Exploring in the weblog space by detecting informative and affective articles
Proceedings of the 16th international conference on World Wide Web
Automatic search engine performance evaluation with click-through data analysis
Proceedings of the 16th international conference on World Wide Web
Cross language text categorization by acquiring multilingual domain models from comparable corpora
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Cross-lingual query classification: a preliminary study
Proceedings of the 2nd ACM workshop on Improving non english web searching
Cross-language query classification using web search for exogenous knowledge
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Latent space domain transfer between high dimensional overlapping distributions
Proceedings of the 18th international conference on World wide web
Heterogeneous transfer learning for image clustering via the social web
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Co-training for cross-lingual sentiment classification
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Efficient Text Classification Using Term Projection
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Transfer Learning beyond Text Classification
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Expert Systems with Applications: An International Journal
Cross-language text classification using structural correspondence learning
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Three challenges in data mining
Frontiers of Computer Science in China
Using information from the target language to improve crosslingual text classification
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Cross lingual text classification by mining multilingual topics from wikipedia
Proceedings of the fourth ACM international conference on Web search and data mining
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Cross-lingual sentiment classification via bi-view non-negative matrix tri-factorization
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Cross-Lingual Adaptation Using Structural Correspondence Learning
ACM Transactions on Intelligent Systems and Technology (TIST)
Bilingual co-training for sentiment classification of chinese product reviews
Computational Linguistics
Transfer learning for cross-company software defect prediction
Information and Software Technology
Cross-lingual text classification with model translation and document translation
Proceedings of the 50th Annual Southeast Regional Conference
Research on text categorization based on a weakly-supervised transfer learning method
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Bi-weighting domain adaptation for cross-language text classification
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Generalized canonical correlation analysis for disparate data fusion
Pattern Recognition Letters
A Comparative Study of Cross-Lingual Sentiment Classification
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Cross-lingual web spam classification
Proceedings of the 22nd international conference on World Wide Web companion
Efficiency investigation of manifold matching for text document classification
Pattern Recognition Letters
Hi-index | 0.00 |
As the World Wide Web in China grows rapidly, mining knowledge in Chinese Web pages becomes more and more important. Mining Web information usually relies on the machine learning techniques which require a large amount of labeled data to train credible models. Although the number of Chinese Web pages increases quite fast, it still lacks Chinese labeled data. However, there are relatively sufficient English labeled Web pages. These labeled data, though in different linguistic representations, share a substantial amount of semantic information with Chinese ones, and can be utilized to help classify Chinese Web pages. In this paper, we propose an information bottleneck based approach to address this cross-language classification problem. Our algorithm first translates all the Chinese Web pages to English. Then, all the Web pages, including Chinese and English ones, are encoded through an information bottleneck which can allow only limited information to pass. Therefore, in order to retain as much useful information as possible, the common part between Chinese and English Web pages is inclined to be encoded to the same code (i.e. class label), which makes the cross-language classification accurate. We evaluated our approach using the Web pages collected from Open Directory Project (ODP). The experimental results show that our method significantly improves several existing supervised and semi-supervised classifiers.