Knowledge Supervised Text Classification with No Labeled Documents

  • Authors:
  • Congle Zhang;Gui-Rong Xue;Yong Yu

  • Affiliations:
  • Apex Lab, Shanghai Jiaotong University, Shanghai, 200240 and State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058;Apex Lab, Shanghai Jiaotong University, Shanghai, 200240 and State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058;No Affiliations,

  • Venue:
  • PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In traditional text classification approaches, the semantic meanings of the classes are described by the labeled documents. Since labeling documents is often time consuming and expensive, it is a promising idea that asking users to provide some keywords to depict the classes, instead of labeling any documents. However, short pieces of keywords may not contain enough information and therefore may lead to unreliable classifier. Fortunately, there are large amount of public data easily available in web directories, such as ODP, Wikipedia, etc. We are interested in exploring the enormous crowd intelligence contained in such public data to enhance text classification. In this paper, we propose a novel text classification framework called "Knowledge Supervised Learning "(KSL), which utilizes the knowledge in keywords and the crowd intelligence to learn the classifier without any labeled documents. We design a two-stage risk minimization (TSRM) approach for the KSL problem. It can optimize the expected prediction risk and build the high quality classifier. Empirical results verify our claim: our algorithm can achieve above 0.9 on Micro-F1 on average, which is much better than baselines and even comparable against SVM classifier supervised by labeled documents.