An empirical study of the noise impact on cost-sensitive learning

Authors:
Xingquan Zhu;Xindong Wu;Taghi M. Khoshgoftaar;Yong Shi
Affiliations:
Dept. of Computer Science & Eng., Florida Atlantic University, Boca Raton, FL and Graduate University, Chinese Academy of Sciences, Beijing, China;Department of Computer Science, University of Vermont, Burlington, VT;Dept. of Computer Science & Eng., Florida Atlantic University, Boca Raton, FL;Graduate University, Chinese Academy of Sciences, Beijing, China
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 9
Cited 8

Cost-Sensitive Learning of Classification Knowledge and Its Applications in Robotics

Machine Learning
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Induction of Decision Trees

Machine Learning
Pruning Decision Trees with Misclassification Costs

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Pruning Improves Heuristic Search for Cost-Sensitive Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An iterative method for multi-class cost-sensitive learning

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Cost-Guided Class Noise Handling for Effective Cost-Sensitive Learning

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining

A hierarchical model for test-cost-sensitive decision systems

Information Sciences: an International Journal
Mining in Large Noisy Domains

Journal of Data and Information Quality (JDIQ)
Which Is Better for Frequent Pattern Mining: Approximate Counting or Sampling?

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Anytime induction of low-cost, low-error classifiers: a sampling-based approach

Journal of Artificial Intelligence Research
CSNL: A cost-sensitive non-linear decision tree algorithm

ACM Transactions on Knowledge Discovery from Data (TKDD)
An exploration of learning when data is noisy and imbalanced

Intelligent Data Analysis
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we perform an empirical study of the impact of noise on cost-sensitive (CS) learning, through observations on how a CS learner reacts to the mislabeled training examples in terms of misclassification cost and classification accuracy. Our empirical results and theoretical analysis indicate that mislabeled training examples can raise serious concerns for cost-sensitive classification, especially when misclassifying some classes becomes extremely expensive. Compared to general inductive learning, the problem of noise handling and data cleansing is more crucial, and should be carefully investigated to ensure the success of CS learning.