Concept based text classification using labeled and unlabeled data

Authors:
Ping Gu;Qingsheng Zhu;Xiping He
Affiliations:
Dept. of Computer Science, Chongqing University, Chongqing, China;Dept. of Computer Science, Chongqing University, Chongqing, China;Dept. of Computer Science, Chongqing University, Chongqing, China
Venue:
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Year:
2006

Citing 9
Cited 0

WordNet: a lexical database for English

Communications of the ACM
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Ontology Learning for the Semantic Web

IEEE Intelligent Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
KAON - Towards a Large Scale Semantic Web

EC-WEB '02 Proceedings of the Third International Conference on E-Commerce and Web Technologies
Ontologies Improve Text Document Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Text Classification by Boosting Weak Learners based on Terms and Concepts

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Unsupervised sense disambiguation using bilingual probabilistic models

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent work has shown improvements in text clustering and classification by integrating conceptual features extracted from background knowledge. In this paper we address the problem of text classification with labeled data and unlabeled data. We propose a Latent Bayes Ensemble model based on word-concept mapping and transductive boosting method. With the knowledge extracted from ontologies, we hope to improve the classification accuracy even with large amounts of unlabeled documents. We conducted several experiments on two well-known corpora and the results are compared with Naïve Bayes and TSVM classifiers.