Extended dimensions for cleaning and querying inconsistent data warehouses

  • Authors:
  • Juan Ramírez;Loreto Bravo;Mónica Caniupán

  • Affiliations:
  • Universidad del Bío-Bío, Concepción, Chile;Universidad de Concepción, Concepción, Chile;Universidad del Bío-Bío, Concepción, Chile

  • Venue:
  • Proceedings of the sixteenth international workshop on Data warehousing and OLAP
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A dimension in a data warehouse (DW) is an abstract concept that groups data that share a common semantic meaning. The dimensions are modeled using a hierarchical schema of categories. A dimension is called strict if every element of each category has exactly one ancestor in each parent category, and covering if each element of a category has an ancestor in each parent category. If a dimension is strict and covering we can use pre-computed results at lower levels to answer queries at higher levels. This capability of computing summaries is vital for efficiency purposes. Nevertheless, when dimensions are not strict/covering it is important to know their strictness and covering constraints to keep the capability of obtaining correct summarizations. Real world dimensions might fail to satisfy these constraints, and, in these cases, it is important to find ways to fix the dimensions (correct them) or find ways to get correct answers to queries posed on inconsistent dimensions. A minimal repair is a new dimension that satisfies the strictness and covering constraints, and that is obtained from the original dimension through a minimum number of changes. The set of minimal repairs can be used as a tool to compute answers to aggregate queries in the presence of inconsistencies. However, computing all of them is NP-hard. In this paper, instead of trying to find all possible minimal repairs, we define a single compatible repair that is consistent with respect to both strictness and covering constraints, is close to the inconsistent dimension, can be computed efficiently and can be used to compute approximate answers to aggregate queries. In order to define the compatible repair we defined the notion of extended dimension that supports sets of elements in categories.