Statistical analysis with missing data
Statistical analysis with missing data
Data quality control theory and pragmatics
Data quality control theory and pragmatics
The specification, engineering, and measurement of information systems quality
Journal of Systems and Software
The notion of data and its quality dimensions
Information Processing and Management: an International Journal
Enhancing database correctness: a statistical approach
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A product perspective on total data quality management
Communications of the ACM
Quality information and knowledge
Quality information and knowledge
Improving data warehouse and business information quality: methods for reducing costs and increasing profits
Data preparation for data mining
Data preparation for data mining
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data quality: the field guide
Information quality benchmarks: product and service performance
Communications of the ACM - Supporting community and building social capital
Discovering and reconciling value conflicts for numerical data integration
Information Systems - Data extraction, cleaning and reconciliation
Information and Database Quality
Information and Database Quality
Mining database structure; or, how to build a data quality browser
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
A Framework for Analysis of Data Quality Research
IEEE Transactions on Knowledge and Data Engineering
Modeling Completeness versus Consistency Tradeoffs in Information Decision Contexts
IEEE Transactions on Knowledge and Data Engineering
Industrial Conference on Data Mining: Advances in Data Mining, Applications in E-Commerce, Medicine, and Knowledge Management
Data Quality Requirements Analysis and Modeling
Proceedings of the Ninth International Conference on Data Engineering
Entity Identification in Database Integration
Proceedings of the Ninth International Conference on Data Engineering
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Quality-driven Integration of Heterogenous Information Systems
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Lineage Tracing for General Data Warehouse Transformations
Proceedings of the 27th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Architecture and Quality in Data Warehouses
CAiSE '98 Proceedings of the 10th International Conference on Advanced Information Systems Engineering
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Exploratory Data Mining and Data Cleaning
Exploratory Data Mining and Data Cleaning
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TAILOR: A Record Linkage Tool Box
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting duplicate objects in XML documents
Proceedings of the 2004 international workshop on Information quality in information systems
Tackling inconsistencies in data integration through source preferences
Proceedings of the 2004 international workshop on Information quality in information systems
Mining for patterns in contradictory data
Proceedings of the 2004 international workshop on Information quality in information systems
A framework for analysis of data freshness
Proceedings of the 2004 international workshop on Information quality in information systems
Methods for evaluating and creating data quality
Information Systems - Special issue: Data quality in cooperative information systems
Mining Customer Value: From Association Rules to Direct Marketing
Data Mining and Knowledge Discovery
A framework for the design of ETL scenarios
CAiSE'03 Proceedings of the 15th international conference on Advanced information systems engineering
Quality-driven query answering for integrated information systems
Quality-driven query answering for integrated information systems
Knowledge and Information Systems
Modified algorithms for synthesizing high-frequency rules from different data sources
Knowledge and Information Systems
Mining fuzzy association rules from uncertain data
Knowledge and Information Systems
Hi-index | 0.00 |
The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying indicators to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder if a so-called “interesting” rule noted LHS→ RHS is meaningful when 30% of the LHS data are not up-to-date anymore, 20% of the RHS data are not accurate, and 15% of the LHS data come from a data source that is well-known for its bad credibility. This paper presents an overview of data quality characterization and management techniques that can be advantageously employed for improving the quality awareness of the knowledge discovery and data mining processes. We propose to integrate data quality indicators for quality aware association rule mining. We propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-Cup-98 datasets show that variations on data quality have a great impact on the cost and quality of discovered association rules and confirm our approach for the integrated management of data quality indicators into the KDD process that ensure the quality of data mining results.