The Use of Background Knowledge in Decision Tree Induction
Machine Learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Machine Learning
Inducing Cost-Sensitive Trees via Instance Weighting
PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Test-Cost Sensitive Classification on Data with Missing Values
IEEE Transactions on Knowledge and Data Engineering
Test-Cost Sensitive Classification Based on Conditioned Loss Functions
ECML '07 Proceedings of the 18th European conference on Machine Learning
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
Much work has been done to deal with the test-cost sensitive learning on data with missing values. There is a confliction of efficiency and accuracy among previous strategies. Sequential test strategies have high accuracy but low efficiency because of their sequential property. Some batch strategies have high efficiency but lead to poor performance since they make all decisions at one time using initial information. In this paper, we propose a new test strategy, GTD algorithm, to address this problem. Our algorithm uses training data to judge the benefits brought by an unknown attribute and chooses the most useful unknown attribute each time until there is no rewarding unknown attributes. It is more reasonable to judge the utility of an unknown attribute from the real performance on training data other than from the estimation. Our strategy is meaningful since it has high efficiency (We only use training data so GTD is not sequential) and lower total costs than the previous strategies at the same time. The experiments also prove that our algorithm significantly outperforms previous algorithms especially when there is a high missing rate and large fluctuations of test costs.