Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Discovering Useful Concept Prototypes for Classification Based on Filtering and Abstraction
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Text Categorization and Its Application to Text Retrieval
IEEE Transactions on Knowledge and Data Engineering
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Exploiting Relations Among Concepts to Acquire Weakly Labeled Training Data
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Partially Supervised Classification of Text Documents
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic Textual Document Categorization Based on Generalized Instance Sets and a Metamodel
IEEE Transactions on Pattern Analysis and Machine Intelligence
Building Text Classifiers Using Positive and Unlabeled Examples
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
IEEE Transactions on Pattern Analysis and Machine Intelligence
Journal of the American Society for Information Science and Technology
Learning to classify texts using positive and unlabeled data
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Artificial immune system for illicit content identification in social media
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
Due to the dramatic increasing of information on the Web, text categorization becomes a useful tool to organize the information. Traditional text categorization problem uses a training set from online sources with pre-defined class labels for text documents. Typically a large amount of online training news should be provided in order to learn a satisfactory categorization scheme. We investigate an innovative way to alleviate the problem. For each category, only a small amount of positive training examples for a set of the major concepts associated with the category are needed. We develop a technique which makes use of unlabeled documents since those documents can be easily collected, such as online news from the Web. Our technique exploits the inherent structure in the set of positive training documents guided by the provided concepts of the category. An algorithm for training document adaptation is developed for automatically seeking representative training examples from the unlabeled data collected from the new online source. Some preliminary experiments on real-world news collection have been conducted to demonstrate the effectiveness of our approach.