Missing values prediction with K2

  • Authors:
  • Estevam R. Hruschka, Jr.;Nelson F. F. Ebecken

  • Affiliations:
  • E-mail: estevamr@terra.com.br;COPPE/Federal University of Rio de Janeiro, Brasil. E-mail: nelson@ntt.ufrj.br

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dealing with missing values is one important task in data mining. There are many ways to work with this kind of data, but the literature doesn't determine the best one to all kinds of data set. The aim of this work is to show the application of a bayesian algorithm (K2) in data mining problems as a data preparation and classification tool. In this paper, the algorithm generates a bayesian network which is used to substitute the missing values. It's done by predicting the most probable instance for the features in each object of the database. The prediction uses an heuristic bayesian conditioning algorithm generating a preprocessed sample. Having this preprocessed sample, the classification is done. The results of the classification with and without the data preparation are analyzed.