Missing values prediction with K2

Authors:
Estevam R. Hruschka, Jr.;Nelson F. F. Ebecken
Affiliations:
E-mail: estevamr@terra.com.br;COPPE/Federal University of Rio de Janeiro, Brasil. E-mail: nelson@ntt.ufrj.br
Venue:
Intelligent Data Analysis
Year:
2002

Citing 21
Cited 5

Statistical analysis with missing data

Statistical analysis with missing data
Probabilistic induction by dynamic part generation in virtual trees

Proceedings of Expert Systems '86, The 6Th Annual Technical Conference on Research and development in expert systems III
Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Unknown attribute values in induction

Proceedings of the sixth international workshop on Machine learning
A Bayesian Method for the Induction of Probabilistic Networks from Data

Machine Learning
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
C4.5: programs for machine learning

C4.5: programs for machine learning
Real-world applications of Bayesian networks

Communications of the ACM
The EM algorithm for graphical association models with missing data

Computational Statistics & Data Analysis - Special issue dedicated to Toma´sˇ Havra´nek
Bagging predictors

Machine Learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Introduction to Bayesian Networks

Introduction to Bayesian Networks
Expert Systems and Probabiistic Network Models

Expert Systems and Probabiistic Network Models
Induction of Decision Trees

Machine Learning
Boosting the margin: A new explanation for the effectiveness of voting methods

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning Belief Networks in the Presence of Missing Values and Hidden Variables

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Stochastic Attribute Selection Committees

AI '98 Selected papers from the 11th Australian Joint Conference on Artificial Intelligence on Advanced Topics in Artificial Intelligence
Techniques for Dealing with Missing Values in Classification

IDA '97 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Lazy decision trees

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Bayesian networks for imputation in classification problems

Journal of Intelligent Information Systems
An iterative refinement approach for data cleaning

Intelligent Data Analysis
Learning Bayesian networks from incomplete databases using a novel evolutionary algorithm

Decision Support Systems
On the influence of imputation in classification: practical issues

Journal of Experimental & Theoretical Artificial Intelligence
A Bayesian imputation method for a clustering genetic algorithm

Journal of Computational Methods in Sciences and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dealing with missing values is one important task in data mining. There are many ways to work with this kind of data, but the literature doesn't determine the best one to all kinds of data set. The aim of this work is to show the application of a bayesian algorithm (K2) in data mining problems as a data preparation and classification tool. In this paper, the algorithm generates a bayesian network which is used to substitute the missing values. It's done by predicting the most probable instance for the features in each object of the database. The prediction uses an heuristic bayesian conditioning algorithm generating a preprocessed sample. Having this preprocessed sample, the classification is done. The results of the classification with and without the data preparation are analyzed.