Web-Based Knowledge Acquisition to Impute Missing Values for Classification

  • Authors:
  • Na Tang;V. Rao Vemuri

  • Affiliations:
  • University of California, Davis;University of California, Davis

  • Venue:
  • WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Machine learning is the science of building predictors from data while accounting for the predictor's accuracy on future data. Many machine learning classifiers can make accurate predictions when the data is complete. In the presence of insufficient data, statistical methods can be applied to fill in a few missing items. But these methods rely only on the available data to calculate the missing values and perform poorly if the percentage of missing values exceeds a threshold. An alternative is to fill in the missing data by an automated knowledge discovery process via mining the WWW. This novel procedure is applied by first restoring missing information and next learning the parameters of the classifier from the restored data. Using a Bayesian network as a classifier, the parameters, i.e., the probabilities associated with the causal relationships in the network, are deduced using the knowledge mined from the WWW in conjunction with the data available on hand. The method, when tested with heart disease data sets from the UC Irvine Machine Learning Repository [UCI repository of machine learning databases], gave satisfactory results.