Enhancing data quality in data warehouse environments
Communications of the ACM
Estimating and improving the quality of information in a MIS
Communications of the ACM
A Framework for Analysis of Data Quality Research
IEEE Transactions on Knowledge and Data Engineering
Estimating the Quality of Databases
FQAS '98 Proceedings of the Third International Conference on Flexible Query Answering Systems
The perils of data misreporting
Communications of the ACM - Blueprint for the future of high-performance networking
Assessing data quality with control matrices
Communications of the ACM - Information cities
The Impact of Experience and Time on the Use of Data Quality Information in Decision Making
Information Systems Research
Lying on the Web: Implications for Expert Systems Redesign
Information Systems Research
Sample-Based Quality Estimation of Query Results in Relational Database Environments
IEEE Transactions on Knowledge and Data Engineering
Supporting data quality management in decision-making
Decision Support Systems
Hi-index | 0.01 |
Information derived from relational databases is routinely used for decision making. However, little thought is usually given to the quality of the source data, its impact on the quality of the derived information, and how this in turn affects decisions. To assess quality, one needs a framework that defines relevant metrics that constitute the quality profile of a relation, and provides mechanisms for their evaluation. We build on a quality framework proposed in prior work, and develop quality profiles for the result of the primitive relational operations Difference and Union. These operations have nuances that make both the classification of the resulting records as well as the estimation of the different classes quite difficult to address, and very different from that for other operations. We first determine how tuples appearing in the results of these operations should be classified as accurate, inaccurate or mismember, and when tuples that should appear do not (called incomplete) in the result. Although estimating the cardinalities of these subsets directly is difficult, we resolve this by decomposing the problem into a sequence of drawing processes, each of which follows a hyper-geometric distribution. Finally, we discuss how decisions would be influenced based on the resulting quality profiles.