Impact of the Union and Difference Operations on the Quality of Information Products

Authors:
Amir Parssian;Sumit Sarkar;Varghese S. Jacob
Affiliations:
Department of Information Systems, Instituto de Empresa Business School, Madrid 28006, Spain;School of Management, University of Texas at Dallas, Richardson, Texas 75080;School of Management, University of Texas at Dallas, Richardson, Texas 75080
Venue:
Information Systems Research
Year:
2009

Citing 13
Cited 2

Modeling Information Manufacturing Systems to Determine Information Product Quality

Management Science
Enhancing data quality in data warehouse environments

Communications of the ACM
Estimating and improving the quality of information in a MIS

Communications of the ACM
A Framework for Analysis of Data Quality Research

IEEE Transactions on Knowledge and Data Engineering
Estimating the Quality of Databases

FQAS '98 Proceedings of the Third International Conference on Flexible Query Answering Systems
The perils of data misreporting

Communications of the ACM - Blueprint for the future of high-performance networking
Assessing data quality with control matrices

Communications of the ACM - Information cities
The Impact of Experience and Time on the Use of Data Quality Information in Decision Making

Information Systems Research
Assessing Data Quality for Information Products: Impact of Selection, Projection, and Cartesian Product

Management Science
Lying on the Web: Implications for Expert Systems Redesign

Information Systems Research
Sample-Based Quality Estimation of Query Results in Relational Database Environments

IEEE Transactions on Knowledge and Data Engineering
Supporting data quality management in decision-making

Decision Support Systems
Managerial decision support with knowledge of accuracy and completeness of the relational aggregate functions

Decision Support Systems

Quality-aware service-oriented data integration: requirements, state of the art and open challenges

ACM SIGMOD Record
Data Quality of Query Results with Generalized Selection Conditions

Operations Research

Quantified Score

Hi-index	0.01

Visualization

Abstract

Information derived from relational databases is routinely used for decision making. However, little thought is usually given to the quality of the source data, its impact on the quality of the derived information, and how this in turn affects decisions. To assess quality, one needs a framework that defines relevant metrics that constitute the quality profile of a relation, and provides mechanisms for their evaluation. We build on a quality framework proposed in prior work, and develop quality profiles for the result of the primitive relational operations Difference and Union. These operations have nuances that make both the classification of the resulting records as well as the estimation of the different classes quite difficult to address, and very different from that for other operations. We first determine how tuples appearing in the results of these operations should be classified as accurate, inaccurate or mismember, and when tuples that should appear do not (called incomplete) in the result. Although estimating the cardinalities of these subsets directly is difficult, we resolve this by decomposing the problem into a sequence of drawing processes, each of which follows a hyper-geometric distribution. Finally, we discuss how decisions would be influenced based on the resulting quality profiles.