A competition strategy to cost-sensitive decision trees

  • Authors:
  • Fan Min;William Zhu

  • Affiliations:
  • Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou, China;Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou, China

  • Venue:
  • RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Learning from data with test cost and misclassification cost has been a hot topic in data mining. Many algorithms have been proposed to induce decision trees for this purpose. This paper studies a number of such algorithms and presents a competition strategy to obtain trees with lower cost. First, we generate a population of decision trees using λ-ID3 and EG2 algorithms through considering information gain and test cost. λ-ID3 is a generalization of three existing algorithms, namely ID3, IDX, and CS-ID3. EG2 is another parameterized algorithm, and its parameter range is extended in this work. Second, we post-prune these trees by considering the tradeoff between the test cost and the misclassification cost. Finally, we select the best decision tree for classification. Experimental results on the mushroom dataset with various cost settings indicate: 1) there does not exist an optimal parameter for λ-ID3 or EG2; 2) the competition strategy is effective in selecting an appropriate decision tree; and 3) post-pruning can help decreasing the average cost effectively.