COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Active + Semi-supervised Learning = Robust Multi-View Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Separability Index in Supervised Learning
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Selective Sampling with Redundant Views
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Employing EM and Pool-Based Active Learning for Text Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Identifying and Handling Mislabelled Instances
Journal of Intelligent Information Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Applying co-training methods to statistical parsing
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Semi-supervised Classification Based on Clustering Ensembles
AICI '09 Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence
Semi-supervised learning based on nearest neighbor rule and cut edges
Knowledge-Based Systems
Question classification based on co-training style semi-supervised learning
Pattern Recognition Letters
A new co-training-style random forest for computer aided diagnosis
Journal of Intelligent Information Systems
Instance selection in semi-supervised learning
Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Tri-training and data editing based semi-supervised clustering algorithm
MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Ensemble based positive unlabeled learning for time series classification
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Positive unlabeled learning for time series classification
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
A new relational Tri-training system with adaptive data editing for inductive logic programming
Knowledge-Based Systems
Exploiting unlabeled data to enhance ensemble diversity
Data Mining and Knowledge Discovery
A novel inductive semi-supervised SVM with graph-based self-training
IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Hi-index | 0.00 |
Self-training is a semi-supervised learning algorithm in which a learner keeps on labeling unlabeled examples and retraining itself on an enlarged labeled training set. Since the self-training process may erroneously label some unlabeled examples, sometimes the learned hypothesis does not perform well. In this paper, a new algorithm named Setred is proposed, which utilizes a specific data editing method to identify and remove the mislabeled examples from the self-labeled data. In detail, in each iteration of the self-training process, the local cut edge weight statistic is used to help estimate whether a newly labeled example is reliable or not, and only the reliable self-labeled examples are used to enlarge the labeled training set. Experiments show that the introduction of data editing is beneficial, and the learned hypotheses of Setred outperform those learned by the standard self-training algorithm.