Quality-Aware association rule mining

Authors:
Laure Berti-Équille
Affiliations:
IRISA, Campus Universitaire de Beaulieu, Rennes, France
Venue:
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2006

Citing 9
Cited 0

Data preparation for data mining

Data preparation for data mining
A Framework for Analysis of Data Quality Research

IEEE Transactions on Knowledge and Data Engineering
Design and Analysis of Quality Information for Data Warehouses

ER '98 Proceedings of the 17th International Conference on Conceptual Modeling
Rule Evaluation Measures: A Unifying View

ILP '99 Proceedings of the 9th International Workshop on Inductive Logic Programming
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Exploratory Data Mining and Data Cleaning

Exploratory Data Mining and Data Cleaning
Mining Customer Value: From Association Rules to Direct Marketing

Data Mining and Knowledge Discovery
Guest Editors' Introduction: Special Section on Intelligent Data Preparation

IEEE Transactions on Knowledge and Data Engineering
Systematic development of data mining-based data quality tools

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Quantified Score

Hi-index	0.00

Visualization

Abstract

The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying subsidies to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder whether a so-called “interesting” rule noted LHS - RHS is meaningful when 30 % of LHS data are not up-to-date anymore, 20% of RHS data are not accurate, and 15% of LHS data come from a data source that is well-known for its bad credibility. In this paper we propose to integrate data quality measures for effective and quality-aware association rule mining and we propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-CUP-98 datasets show for different variations of data quality indicators the corresponding cost and quality of discovered association rules that can be legitimately (or not) selected.