Data quality in massive data sets

  • Authors:
  • Michael F. Goodchild;Keith C. Clarke

  • Affiliations:
  • National Center for Geographic Information and Analysis, and Department of Geography, University of California, Santa Barbara, CA;National Center for Geographic Information and Analysis, and Department of Geography, University of California, Santa Barbara, CA

  • Venue:
  • Handbook of massive data sets
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

All data contain errors, and large spatial data sets are especially prone because they contain data from multiple sources, and use different assumptions about structure and semantics. The general issue is one of data quality assurance, defined in terms of lineage, completeness, logical consistency, attribute accuracy, and positional accuracy. We review a series of quality metrics suitable for empirical description of data quality, and consider some of the special issues of quality related to spatial data, especially the need to incorporate visualizations of data quality into graphics and maps. We conclude that data quality is an essential component of software for spatial data handling, including geographic information systems.