Missing or absent? A Question in Cost-sensitive Decision Tree

  • Authors:
  • Zhenxing Qin;Shichao Zhang;Chengqi Zhang

  • Affiliations:
  • Faculty of Information Technology, University of Technology, Sydney, PO Box 123, Broadway, Sydney, NSW 2007, Australia, {zqin, zhangsc, chengqi}@it.uts.edu.au;Faculty of Information Technology, University of Technology, Sydney, PO Box 123, Broadway, Sydney, NSW 2007, Australia, {zqin, zhangsc, chengqi}@it.uts.edu.au;Faculty of Information Technology, University of Technology, Sydney, PO Box 123, Broadway, Sydney, NSW 2007, Australia, {zqin, zhangsc, chengqi}@it.uts.edu.au

  • Venue:
  • Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

One common source of error in data is the existence of missing value fields. Imputation method has been a widely used technique in preprocessing phase of data mining, in which missing values are replaced by some estimated values. Previous work is trying to seek the “original” values according to specific criteria, such as statistics measure. However, in domain of cost-sensitive learning, minimal overall cost is the most important issue, i.e. a value which can minimize total cost is prefer than the “best” value upon common sense. For example, in medical domains, some data fields usually are left as absent and known information is enough for a decision. In this paper, we proposed a new method to study the problem of “missing or absent values?” in the domain cost-sensitive learning. Experiment results show some improvements with distinguished missing and absent data in cost-sensitive decision tree.