How good is that data in the warehouse?

  • Authors:
  • John M. Artz

  • Affiliations:
  • George Washington University

  • Venue:
  • ACM SIGMIS Database
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

A data warehouse is an analytical database used for decision support. Data are copied from production databases, cleaned up, and possibly renormalized (i.e., denormalized for performance or normalized to create correct record structures). If the resulting records are normalized incorrectly or if the users do not understand how the records have been denormalized, then a phenomenon called semantic disintegrity may occur. Semantic disintegrity occurs when a user submits a query and receives an answer, but the answer is not the answer to the question they believe that they asked.Thus an understanding of normalization is critically important for both database designers and database users. Unfortunately, the process of normalization relies on a series of heuristics that, in turn, assume the existence of an innate mental logic in the mind of the database designer or user for understanding data dependencies and their implications. The quality of answers derived from the data warehouse, in turn, relies on the existence of this innate mental logic.In order to determine if this is a reasonable assumption, an empirical test was constructed for the purpose of determining if subjects have an innate mental logic for understanding data dependencies and their implications.