A new approach for semi-supervised online news classification

Authors:
Hon-Man Ko;Wai Lam
Affiliations:
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong;Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
Venue:
HSI'05 Proceedings of the 3rd international conference on Human Society@Internet: web and Communication Technologies and Internet-Related Social Issues
Year:
2005

Citing 14
Cited 1

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Discovering Useful Concept Prototypes for Classification Based on Filtering and Abstraction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Text Categorization and Its Application to Text Retrieval

IEEE Transactions on Knowledge and Data Engineering
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Exploiting Relations Among Concepts to Acquire Weakly Labeled Training Data

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic Textual Document Categorization Based on Generalized Instance Sets and a Metamodel

IEEE Transactions on Pattern Analysis and Machine Intelligence
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Semisupervised Learning of Classifiers: Theory, Algorithms, and Their Application to Human-Computer Interaction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Context-based generic cross-lingual retrieval of documents and automated summaries: Research Articles

Journal of the American Society for Information Science and Technology
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Artificial immune system for illicit content identification in social media

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the dramatic increasing of information on the Web, text categorization becomes a useful tool to organize the information. Traditional text categorization problem uses a training set from online sources with pre-defined class labels for text documents. Typically a large amount of online training news should be provided in order to learn a satisfactory categorization scheme. We investigate an innovative way to alleviate the problem. For each category, only a small amount of positive training examples for a set of the major concepts associated with the category are needed. We develop a technique which makes use of unlabeled documents since those documents can be easily collected, such as online news from the Web. Our technique exploits the inherent structure in the set of positive training documents guided by the provided concepts of the category. An algorithm for training document adaptation is developed for automatically seeking representative training examples from the unlabeled data collected from the new online source. Some preliminary experiments on real-world news collection have been conducted to demonstrate the effectiveness of our approach.