Training pool selection for semi-supervised learning

Authors:
Jian Ge;Tinghuai Ma;Qiaoqiao Yan;Yonggang Yan;Wei Tian
Affiliations:
Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, China,College of Computer & Software, Nanjing University of Information S ...;Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, China,College of Computer & Software, Nanjing University of Information S ...;Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, China,College of Computer & Software, Nanjing University of Information S ...;Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, China,College of Computer & Software, Nanjing University of Information S ...;Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, China,College of Computer & Software, Nanjing University of Information S ...
Venue:
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Year:
2012

Citing 16
Cited 0

Information-based objective functions for active data selection

Neural Computation
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-Supervised Learning on Riemannian Manifolds

Machine Learning
Activity Recognition Based on Semi-supervised Learning

RTCSA '07 Proceedings of the 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
Improving supervised learning performance by using fuzzy clustering method to select training data

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Fuzzy theory and technology with applications
Active Learning for High Throughput Screening

DS '08 Proceedings of the 11th International Conference on Discovery Science
Representative sampling for text classification using support vector machines

ECIR'03 Proceedings of the 25th European conference on IR research
Learning with unlabeled data and its application to image retrieval

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Semi-supervised learning by disagreement

Knowledge and Information Systems
An extension of the aspect PLSA model to active and semi-supervised learning for text classification

SETN'10 Proceedings of the 6th Hellenic conference on Artificial Intelligence: theories, models and applications
Combining committee-based semi-supervised and active learning and its application to handwritten digits recognition

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi-supervised leaning deals with methods for automatically exploiting unlabeled samples in addition to labeled set. The data selection is an important topic in active learning. It addresses the selection the valuable unlabeled data to label, considering that labeling data is a costly job. In this paper, we want to discuss in detail three aspects of technology in data selection, which includes how to select the unlabeled sample, how many unlabeled samples should be selected and how to define the capacity of the training pool. Experiments which use self-training based on C4.5 show that while the L labeled ratio lager continuous, the initial error value becomes smaller. Also when L labeled ratio is less than 10%, the selection ratio value should be set in less than 0.8.The error value has no significant change while selection ratio value larger than 1.0.