The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Unsupervised Feature Generation using Knowledge Repositories for Effective Text Categorization
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Collaborative future event recommendation
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improving semi-supervised text classification by using wikipedia knowledge
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Hi-index | 0.00 |
The World Wide Web has many document repositories that can act as valuable sources of additional data for various machine learning tasks. In this paper, we propose a method of improving text classification accuracy by using such an additional corpus that can easily be obtained from the web. This additional corpus can be unlabeled and independent of the given classification task. The method proposed here uses topic modeling to extract a set of topics from the additional corpus. Those extracted topics then act as additional features of the data of the given classification task. An evaluation on the RCV1 dataset shows significant improvement over a baseline method.