Algorithms for clustering data
Algorithms for clustering data
C4.5: programs for machine learning
C4.5: programs for machine learning
Toward quality data: an attribute-based approach
Decision Support Systems - Special issue on information technologies and systems
The KDD process for extracting useful knowledge from volumes of data
Communications of the ACM
Communications of the ACM
Data mining: concepts and techniques
Data mining: concepts and techniques
A Framework for Analysis of Data Quality Research
IEEE Transactions on Knowledge and Data Engineering
Finding Association Rules That Trade Support Optimally against Confidence
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Optimal Class Association Rule Set
PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
A Recycle Technique of Association Rule for Missing Value Completion
AINA '03 Proceedings of the 17th International Conference on Advanced Information Networking and Applications
Dimensionality Reduction of Unsupervised Data
ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
The perils of data misreporting
Communications of the ACM - Blueprint for the future of high-performance networking
The Impact of Experience and Time on the Use of Data Quality Information in Decision Making
Information Systems Research
Beyond accuracy: what data quality means to data consumers
Journal of Management Information Systems
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Utility-driven assessment of data quality
ACM SIGMIS Database
Information supply chain: a unified framework for information-sharing
ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Hi-index | 0.00 |
In today’s data-rich environment, decision makers draw conclusions from data repositories that may contain data quality problems. In this context, missing data is an important and known problem, since it can seriously affect the accuracy of conclusions drawn. Researchers have described several approaches for dealing with missing data, primarily attempting to infer values or estimate the impact of missing data on conclusions. However, few have considered approaches to characterize patterns of bias in missing data, that is, to determine the specific attributes that predict the missingness of data values. Knowledge of the specific systematic bias patterns in the incidence of missing data can help analysts more accurately assess the quality of conclusions drawn from data sets with missing data. This research proposes a methodology to combine a number of Knowledge Discovery and Data Mining techniques, including association rule mining, to discover patterns in related attribute values that help characterize these bias patterns. We demonstrate the efficacy of our proposed approach by applying it on a demo census dataset seeded with biased missing data. The experimental results show that our approach was able to find seeded biases and filter out most seeded noise.