New Labeling Strategy for Semi-supervised Document Categorization

Authors:
Yan Zhu;Liping Jing;Jian Yu
Affiliations:
School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
Venue:
KSEM '09 Proceedings of the 3rd International Conference on Knowledge Science, Engineering and Management
Year:
2009

Citing 27
Cited 0

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Incorporating Prior Knowledge into Boosting

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Ontologies Improve Text Document Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Incorporating prior knowledge with weighted margin support vector machines

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Enhancing relevance feedback in image retrieval using unlabeled data

ACM Transactions on Information Systems (TOIS)
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

IEEE Transactions on Knowledge and Data Engineering
Clustering short texts using wikipedia

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Label Propagation through Linear Neighborhoods

IEEE Transactions on Knowledge and Data Engineering
Survey of Text Mining II: Clustering, Classification, and Retrieval

Survey of Text Mining II: Clustering, Classification, and Retrieval
Learning from labeled features using generalized expectation criteria

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Data Clustering: 50 Years Beyond K-means

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Semi-supervised learning using label mean

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A hybrid generative/discriminative approach to semi-supervised classifier design

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Semi-supervised learning with very few labeled training examples

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Semi-Supervised Learning

Semi-Supervised Learning
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Quantified Score

Hi-index	0.00

Visualization

Abstract

Usually, semi-supervised learning requires a number of prior knowledge to supervise the learning process, such as, seeds in Seeded-Kmeans, pair-wise constraints in COP-Kmeans, and labeled data for training an initial useful classifier in S3VM. Such prior knowledge is generally provided by the domain expert, so it is very expensive. In this paper, we propose a new automatical document labeling strategy to derive much more prior knowledge based on the very limited labeled data and the whole data set. Experimental results on 20-Newsgroup text data have shown that the new strategy is helpful for semi-supervised document categorization and improves the learning performance.