SETRED: self-training with editing

Authors:
Ming Li;Zhi-Hua Zhou
Affiliations:
National Laboratory for Novel Software Technology, Nanjing University, Nanjing, China;National Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Venue:
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2005

Citing 13
Cited 14

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Active + Semi-supervised Learning = Robust Multi-View Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Separability Index in Supervised Learning

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Selective Sampling with Redundant Views

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Identifying and Handling Mislabelled Instances

Journal of Intelligent Information Systems
Semisupervised Learning of Classifiers: Theory, Algorithms, and Their Application to Human-Computer Interaction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Applying co-training methods to statistical parsing

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies

Semi-supervised Classification Based on Clustering Ensembles

AICI '09 Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence
Semi-supervised learning based on nearest neighbor rule and cut edges

Knowledge-Based Systems
Question classification based on co-training style semi-supervised learning

Pattern Recognition Letters
A new co-training-style random forest for computer aided diagnosis

Journal of Intelligent Information Systems
Instance selection in semi-supervised learning

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Tri-training and data editing based semi-supervised clustering algorithm

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Ensemble based positive unlabeled learning for time series classification

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Positive unlabeled learning for time series classification

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Cost-Sensitive self-training

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
A new relational Tri-training system with adaptive data editing for inductive logic programming

Knowledge-Based Systems
Exploiting unlabeled data to enhance ensemble diversity

Data Mining and Knowledge Discovery
A novel inductive semi-supervised SVM with graph-based self-training

IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Semi-supervised multi-label image classification based on nearest neighbor editing

Neurocomputing
On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Self-training is a semi-supervised learning algorithm in which a learner keeps on labeling unlabeled examples and retraining itself on an enlarged labeled training set. Since the self-training process may erroneously label some unlabeled examples, sometimes the learned hypothesis does not perform well. In this paper, a new algorithm named Setred is proposed, which utilizes a specific data editing method to identify and remove the mislabeled examples from the self-labeled data. In detail, in each iteration of the self-training process, the local cut edge weight statistic is used to help estimate whether a newly labeled example is reliable or not, and only the reliable self-labeled examples are used to enlarge the labeled training set. Experiments show that the introduction of data editing is beneficial, and the learned hypotheses of Setred outperform those learned by the standard self-training algorithm.