An improved sample selection algorithm in fuzzy decision tree induction

Authors:
Ling-Cai Dong;Dan Wang;Xi-Zhao Wang
Affiliations:
College of Mathematics and Computer Science, Hebei University, Baoding, China;College of Mathematics and Computer Science, Hebei University, Baoding, China;College of Mathematics and Computer Science, Hebei University, Baoding, China
Venue:
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Year:
2009

Citing 14
Cited 0

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Query Learning Strategies Using Boosting and Bagging

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Query Learning with Large Margin Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Selective Sampling for Nearest Neighbor Classifiers

Machine Learning
Diverse ensembles for active learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Semi-supervised learning with graphs

Semi-supervised learning with graphs
Data clustering using virtual population based incremental learning algorithm with similarity matrix encoding strategy

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Genetic-guided semi-supervised clustering algorithm with instance-level constraints

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Learning assignment order of instances for the constrained K-means clustering algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper improves a method of sample selection based on maximum entropy. Compared with the original method, the improved one takes the probability distribution of unlabeled instances into consideration. It selects the instances which can reduce the uncertainty of the whole unlabeled set to a great extent. The uncertainty reduction of the whole unlabeled set caused by an instance is measured by the instance's uncertainty and its influence index on the whole unlabeled set. To calculate the influence index conveniently, we introduces the similar matrix, the elements of which are the similarities measured by the distances between instances. The new method avoids the drawbacks that some abnormal or isolated samples may be selected by original method. Thus it can select the instances with more representation and more capability to resist noises. Our experimental results show that the performance of the classifier built from samples selected by the new algorithm is better than those selected by original method in the same time complexity.