COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Toward Optimal Active Learning through Sampling Estimation of Error Reduction
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Query Learning Strategies Using Boosting and Bagging
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Query Learning with Large Margin Classifiers
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Selective Sampling for Nearest Neighbor Classifiers
Machine Learning
Diverse ensembles for active learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Semi-supervised learning with graphs
Semi-supervised learning with graphs
Proceedings of the 10th annual conference on Genetic and evolutionary computation
Genetic-guided semi-supervised clustering algorithm with instance-level constraints
Proceedings of the 10th annual conference on Genetic and evolutionary computation
Learning assignment order of instances for the constrained K-means clustering algorithm
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.00 |
This paper improves a method of sample selection based on maximum entropy. Compared with the original method, the improved one takes the probability distribution of unlabeled instances into consideration. It selects the instances which can reduce the uncertainty of the whole unlabeled set to a great extent. The uncertainty reduction of the whole unlabeled set caused by an instance is measured by the instance's uncertainty and its influence index on the whole unlabeled set. To calculate the influence index conveniently, we introduces the similar matrix, the elements of which are the similarities measured by the distances between instances. The new method avoids the drawbacks that some abnormal or isolated samples may be selected by original method. Thus it can select the instances with more representation and more capability to resist noises. Our experimental results show that the performance of the classifier built from samples selected by the new algorithm is better than those selected by original method in the same time complexity.