"Missing Is Useful': Missing Values in Cost-Sensitive Decision Trees

Authors:
Shichao Zhang;Zhenxing Qin;Charles X. Ling;Shengli Sheng
Affiliations:
IEEE;-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2005

Citing 17
Cited 36

Statistical analysis with missing data

Statistical analysis with missing data
Unknown attribute values in induction

Proceedings of the sixth international workshop on Machine learning
The Use of Background Knowledge in Decision Tree Induction

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Cost-Sensitive Learning of Classification Knowledge and Its Applications in Robotics

Machine Learning
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Imputation of Missing Data in Industrial Databases

Applied Intelligence
Learning cost-sensitive active classifiers

Artificial Intelligence
The CN2 Induction Algorithm

Machine Learning
Pruning Improves Heuristic Search for Cost-Sensitive Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Inducing Cost-Sensitive Trees via Instance Weighting

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Decision trees with minimal costs

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Test-Cost Sensitive Naive Bayes Classification

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm

Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Lazy decision trees

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Estimating query result sizes for proxy caching in scientific database federations

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Semi-parametric optimization for missing data imputation

Applied Intelligence
Evolutionary Induction of Decision Trees for Misclassification Cost Minimization

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Modeling Reuse on Case-Based Reasoning with Application to Breast Cancer Diagnosis

AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
POP algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases

Expert Systems with Applications: An International Journal
Learning from Skewed Class Multi-relational Databases

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Missing Data Analysis: A Kernel-Based Multi-Imputation Approach

Transactions on Computational Science III
Missing or absent? A Question in Cost-sensitive Decision Tree

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
A Two-Phase Model for Learning Rules from Incomplete Data

Fundamenta Informaticae - Fundamentals of Knowledge Technology
An Efficient Prediction Model for Diabetic Database Using Soft Computing Techniques

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Rough neuro-fuzzy structures for classification with missing data

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
An Investigation of Missing Data Methods for Classification Trees Applied to Binary Response Data

The Journal of Machine Learning Research
Optimized parameters for missing data imputation

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Cost-time sensitive decision tree with missing values

KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Visualization of the critical patterns of missing values in classification data

VISUAL'07 Proceedings of the 9th international conference on Advances in visual information systems
Two-phase rule induction from incomplete data

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
Cost-sensitive classification with respect to waiting cost

Knowledge-Based Systems
Missing value imputation based on data clustering

Transactions on computational science I
Cost sensitive classification in data mining

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
eXiT*CBR: A framework for case-based medical diagnosis development and experimentation

Artificial Intelligence in Medicine
Shell-neighbor method and its application in missing data imputation

Applied Intelligence
An interval set model for learning rules from incomplete information table

International Journal of Approximate Reasoning
Evolutionary induction of cost-sensitive decision trees

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Any-cost discovery: learning optimal classification rules

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Decision tree classifiers sensitive to heterogeneous costs

Journal of Systems and Software
Cost-sensitive decision tree for uncertain data

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
A hybrid particle swarm optimization based fuzzy expert system for the diagnosis of coronary artery disease

Expert Systems with Applications: An International Journal
Nearest neighbor selection for iteratively kNN imputation

Journal of Systems and Software
A Two-Phase Model for Learning Rules from Incomplete Data

Fundamenta Informaticae - Fundamentals of Knowledge Technology
Learning from Skewed Class Multi-relational Databases

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Information enhancement for data mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Cost-sensitive decision trees applied to medical data

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
A survey of cost-sensitive decision tree induction algorithms

ACM Computing Surveys (CSUR)
Imputation for categorical attributes with probabilistic reasoning

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
eXiTCDSS: A framework for a workflow-based CBR for interventional Clinical Decision Support Systems and its application to TAVI

Expert Systems with Applications: An International Journal
Updating attribute reduction in incomplete decision systems with the variation of attribute set

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many real-world data sets for machine learning and data mining contain missing values and much previous research regards it as a problem and attempts to impute missing values before training and testing. In this paper, we study this issue in cost-sensitive learning that considers both test costs and misclassification costs. If some attributes (tests) are too expensive in obtaining their values, it would be more cost-effective to miss out their values, similar to skipping expensive and risky tests (missing values) in patient diagnosis (classification). That is, "missing is useful驴 as missing values actually reduces the total cost of tests and misclassifications and, therefore, it is not meaningful to impute their values. We discuss and compare several strategies that utilize only known values and that "missing is useful驴 for cost reduction in cost-sensitive decision tree learning.