Data preparation for data mining
Data preparation for data mining
A Framework for Analysis of Data Quality Research
IEEE Transactions on Knowledge and Data Engineering
Design and Analysis of Quality Information for Data Warehouses
ER '98 Proceedings of the 17th International Conference on Conceptual Modeling
Rule Evaluation Measures: A Unifying View
ILP '99 Proceedings of the 9th International Workshop on Inductive Logic Programming
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Exploratory Data Mining and Data Cleaning
Exploratory Data Mining and Data Cleaning
Mining Customer Value: From Association Rules to Direct Marketing
Data Mining and Knowledge Discovery
Guest Editors' Introduction: Special Section on Intelligent Data Preparation
IEEE Transactions on Knowledge and Data Engineering
Systematic development of data mining-based data quality tools
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Hi-index | 0.00 |
The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying subsidies to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder whether a so-called “interesting” rule noted LHS - RHS is meaningful when 30 % of LHS data are not up-to-date anymore, 20% of RHS data are not accurate, and 15% of LHS data come from a data source that is well-known for its bad credibility. In this paper we propose to integrate data quality measures for effective and quality-aware association rule mining and we propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-CUP-98 datasets show for different variations of data quality indicators the corresponding cost and quality of discovered association rules that can be legitimately (or not) selected.