Test-cost sensitive classification using greedy algorithm on training data

Authors:
Chang Wan
Affiliations:
School of Information Science and Technology, Sun Yat-sen University, Guangzhou, China
Venue:
ISICA'10 Proceedings of the 5th international conference on Advances in computation and intelligence
Year:
2010

Citing 11
Cited 0

The Use of Background Knowledge in Decision Tree Induction

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Cost-Sensitive Learning of Classification Knowledge and Its Applications in Robotics

Machine Learning
Support-Vector Networks

Machine Learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Machine Learning

Machine Learning
Inducing Cost-Sensitive Trees via Instance Weighting

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Test-Cost Sensitive Classification on Data with Missing Values

IEEE Transactions on Knowledge and Data Engineering
Test-Cost Sensitive Classification Based on Conditioned Loss Functions

ECML '07 Proceedings of the 18th European conference on Machine Learning
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Much work has been done to deal with the test-cost sensitive learning on data with missing values. There is a confliction of efficiency and accuracy among previous strategies. Sequential test strategies have high accuracy but low efficiency because of their sequential property. Some batch strategies have high efficiency but lead to poor performance since they make all decisions at one time using initial information. In this paper, we propose a new test strategy, GTD algorithm, to address this problem. Our algorithm uses training data to judge the benefits brought by an unknown attribute and chooses the most useful unknown attribute each time until there is no rewarding unknown attributes. It is more reasonable to judge the utility of an unknown attribute from the real performance on training data other than from the estimation. Our strategy is meaningful since it has high efficiency (We only use training data so GTD is not sequential) and lower total costs than the previous strategies at the same time. The experiments also prove that our algorithm significantly outperforms previous algorithms especially when there is a high missing rate and large fluctuations of test costs.