An improved sample selection algorithm in fuzzy decision tree induction

  • Authors:
  • Ling-Cai Dong;Dan Wang;Xi-Zhao Wang

  • Affiliations:
  • College of Mathematics and Computer Science, Hebei University, Baoding, China;College of Mathematics and Computer Science, Hebei University, Baoding, China;College of Mathematics and Computer Science, Hebei University, Baoding, China

  • Venue:
  • SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper improves a method of sample selection based on maximum entropy. Compared with the original method, the improved one takes the probability distribution of unlabeled instances into consideration. It selects the instances which can reduce the uncertainty of the whole unlabeled set to a great extent. The uncertainty reduction of the whole unlabeled set caused by an instance is measured by the instance's uncertainty and its influence index on the whole unlabeled set. To calculate the influence index conveniently, we introduces the similar matrix, the elements of which are the similarities measured by the distances between instances. The new method avoids the drawbacks that some abnormal or isolated samples may be selected by original method. Thus it can select the instances with more representation and more capability to resist noises. Our experimental results show that the performance of the classifier built from samples selected by the new algorithm is better than those selected by original method in the same time complexity.