OLAP over imprecise data with domain constraints

  • Authors:
  • Doug Burdick;AnHai Doan;Raghu Ramakrishnan;Shivakumar Vaithyanathan

  • Affiliations:
  • University of Wisconsin - Madison;University of Wisconsin - Madison;Yahoo! Research;IBM Almaden Research Center

  • Venue:
  • VLDB '07 Proceedings of the 33rd international conference on Very large data bases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several recent papers have focused on OLAP over imprecise data, where each fact can be a region, instead of a point, in a multi-dimensional space. They have provided a multiple-world semantics for such data, and developed efficient ways to answer OLAP aggregation queries over the imprecise facts. These solutions, however, assume that the imprecise facts can be interpreted independently of one another, a key assumption that is often violated in practice. Indeed, imprecise facts in real-world applications are often correlated, and such correlations can be captured as domain integrity constraints (e.g., repairs with the same customer names and models took place in the same city, or a text span can refer to a person or a city, but not both). In this paper we provide a framework for answering OLAP aggregation queries over imprecise data in the presence of such domain constraints. We first describe a relatively simple yet powerful constraint language, and formalize what it means to take into account such constraints in query answering. Next, we prove that OLAP queries can be answered efficiently given a database D* of fact marginals. We then exploit the regularities in the constraint space (captured in a constraint hypergraph) and the fact space to efficiently construct D*. We present extensive experiments over real-world and synthetic data to demonstrate the effectiveness of our approach.