Knowledge Supervised Text Classification with No Labeled Documents

Authors:
Congle Zhang;Gui-Rong Xue;Yong Yu
Affiliations:
Apex Lab, Shanghai Jiaotong University, Shanghai, 200240 and State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058;Apex Lab, Shanghai Jiaotong University, Shanghai, 200240 and State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058;No Affiliations,
Venue:
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Year:
2008

Citing 16
Cited 1

A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Incorporating Prior Knowledge into Boosting

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Statistical Models for Co-occurrence Data

Statistical Models for Co-occurrence Data
Incorporating prior knowledge with weighted margin support vector machines

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Constructing informative priors using transfer learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Document clustering with prior knowledge

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Constructing informative prior distributions from domain knowledge in text classification

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Text classification by labeling words

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
InterActive feature selection

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Feature generation for text categorization using world knowledge

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Transductive learning for text classification using explicit knowledge models

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Large-scale hierarchical text classification without labelled data

Proceedings of the fourth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In traditional text classification approaches, the semantic meanings of the classes are described by the labeled documents. Since labeling documents is often time consuming and expensive, it is a promising idea that asking users to provide some keywords to depict the classes, instead of labeling any documents. However, short pieces of keywords may not contain enough information and therefore may lead to unreliable classifier. Fortunately, there are large amount of public data easily available in web directories, such as ODP, Wikipedia, etc. We are interested in exploring the enormous crowd intelligence contained in such public data to enhance text classification. In this paper, we propose a novel text classification framework called "Knowledge Supervised Learning "(KSL), which utilizes the knowledge in keywords and the crowd intelligence to learn the classifier without any labeled documents. We design a two-stage risk minimization (TSRM) approach for the KSL problem. It can optimize the expected prediction risk and build the high quality classifier. Empirical results verify our claim: our algorithm can achieve above 0.9 on Micro-F1 on average, which is much better than baselines and even comparable against SVM classifier supervised by labeled documents.