Improving mining quality by exploiting data dependency

Authors:
Fang Chu;Yizhou Wang;Carlo Zaniolo;D. Stott Parker
Affiliations:
University of California, Los Angeles, CA;University of California, Los Angeles, CA;University of California, Los Angeles, CA;University of California, Los Angeles, CA
Venue:
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2005

Citing 9
Cited 0

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Discovering informative patterns and data cleaning

Advances in knowledge discovery and data mining
KDD Cup 2001 report

ACM SIGKDD Explorations Newsletter
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Exploiting unlabeled data in ensemble methods

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Dealing with predictive-but-unpredictable attributes in noisy data sources

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Model-driven data acquisition in sensor networks

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Loopy belief propagation for approximate inference: an empirical study

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Turbo decoding as an instance of Pearl's “belief propagation” algorithm

IEEE Journal on Selected Areas in Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The usefulness of the results produced by data mining methods can be critically impaired by several factors such as (1) low quality of data, including errors due to contamination, or incompleteness due to limited bandwidth for data acquisition, and (2) inadequacy of the data model for capturing complex probabilistic relationships in data. Fortunately, a wide spectrum of applications exhibit strong dependencies between data samples. For example, the readings of nearby sensors are generally correlated, and proteins interact with each other when performing crucial functions. Therefore, dependencies among data can be successfully exploited to remedy the problems mentioned above. In this paper, we propose a unified approach to improving mining quality using Markov networks as the data model to exploit local dependencies. Belief propagation is used to efficiently compute the marginal or maximum posterior probabilities, so as to clean the data, to infer missing values, or to improve the mining results from a model that ignores these dependencies. To illustrate the benefits and great generality of the technique, we present its application to three challenging problems: (i) cost-efficient sensor probing, (ii) enhancing protein function predictions, and (iii) sequence data denoising.