Web-Based Knowledge Acquisition to Impute Missing Values for Classification

Authors:
Na Tang;V. Rao Vemuri
Affiliations:
University of California, Davis;University of California, Davis
Venue:
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2004

Citing 10
Cited 2

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

Machine Learning
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
Machine Learning

Machine Learning
Learning Belief Networks in the Presence of Missing Values and Hidden Variables

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Querying Web Data - The WebQA Approach

WISE '02 Proceedings of the 3rd International Conference on Web Information Systems Engineering
The Bayesian structural EM algorithm

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Elicitation of probabilities for belief networks: combining qualitative and quantitative information

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Impact of imputation of missing values on classification error for discrete data

Pattern Recognition
User-Interest-Based document filtering via semi-supervised clustering

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine learning is the science of building predictors from data while accounting for the predictor's accuracy on future data. Many machine learning classifiers can make accurate predictions when the data is complete. In the presence of insufficient data, statistical methods can be applied to fill in a few missing items. But these methods rely only on the available data to calculate the missing values and perform poorly if the percentage of missing values exceeds a threshold. An alternative is to fill in the missing data by an automated knowledge discovery process via mining the WWW. This novel procedure is applied by first restoring missing information and next learning the parameters of the classifier from the restored data. Using a Bayesian network as a classifier, the parameters, i.e., the probabilities associated with the causal relationships in the network, are deduced using the knowledge mined from the WWW in conjunction with the data available on hand. The method, when tested with heart disease data sets from the UC Irvine Machine Learning Repository [UCI repository of machine learning databases], gave satisfactory results.