An Accuracy Metric: Percentages, Randomness, and Probabilities

  • Authors:
  • Craig W. Fisher;Eitel J. M. Lauria;Carolyn C. Matheus

  • Affiliations:
  • Marist College;Marist College;State University of New York at Albany

  • Venue:
  • Journal of Data and Information Quality (JDIQ)
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Practitioners and researchers regularly refer to error rates or accuracy percentages of databases. The former is the number of cells in error divided by the total number of cells; the latter is the number of correct cells divided by the total number of cells. However, databases may have similar error rates (or accuracy percentages) but differ drastically in the complexity of their accuracy problems. A simple percent does not provide information as to whether the errors are systematic or randomly distributed throughout the database. We expand the accuracy metric to include a randomness measure and include a probability distribution value. The proposed randomness check is based on the Lempel-Ziv (LZ) complexity measure. Through two simulation studies we show that the LZ complexity measure can clearly differentiate as to whether the errors are random or systematic. This determination is a significant first step and is a major departure from the percentage-alone technique. Once it is determined that the errors are random, a probability distribution, Poisson, is used to help address various managerial questions.