An Accuracy Metric: Percentages, Randomness, and Probabilities

Authors:
Craig W. Fisher;Eitel J. M. Lauria;Carolyn C. Matheus
Affiliations:
Marist College;Marist College;State University of New York at Albany
Venue:
Journal of Data and Information Quality (JDIQ)
Year:
2009

Citing 12
Cited 2

Quality

Quality
Anchoring data quality dimensions in ontological foundations

Communications of the ACM
The impact of poor data quality on the typical enterprise

Communications of the ACM
Quality information and knowledge

Quality information and knowledge
Enhancing data quality in data warehouse environments

Communications of the ACM
Data Quality: The Accuracy Dimension

Data Quality: The Accuracy Dimension
Beyond accuracy: what data quality means to data consumers

Journal of Management Information Systems
Total quality management in information systems development: key constructs and relationships

Journal of Management Information Systems - Special section: Exploring the outlands of the MIS discipline
Journey to Data Quality

Journey to Data Quality
Time-Related Factors of Data Quality in Multichannel Information Systems

Journal of Management Information Systems
Antecedents of Information and System Quality: An Empirical Examination Within the Context of Data Warehousing

Journal of Management Information Systems
An Introduction to Kolmogorov Complexity and Its Applications

An Introduction to Kolmogorov Complexity and Its Applications

A multidimensional analysis of data quality for credit risk management: New insights and challenges

Information and Management
Determining the relative accuracy of attributes

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Practitioners and researchers regularly refer to error rates or accuracy percentages of databases. The former is the number of cells in error divided by the total number of cells; the latter is the number of correct cells divided by the total number of cells. However, databases may have similar error rates (or accuracy percentages) but differ drastically in the complexity of their accuracy problems. A simple percent does not provide information as to whether the errors are systematic or randomly distributed throughout the database. We expand the accuracy metric to include a randomness measure and include a probability distribution value. The proposed randomness check is based on the Lempel-Ziv (LZ) complexity measure. Through two simulation studies we show that the LZ complexity measure can clearly differentiate as to whether the errors are random or systematic. This determination is a significant first step and is a major departure from the percentage-alone technique. Once it is determined that the errors are random, a probability distribution, Poisson, is used to help address various managerial questions.