OLAP over imprecise data with domain constraints

Authors:
Doug Burdick;AnHai Doan;Raghu Ramakrishnan;Shivakumar Vaithyanathan
Affiliations:
University of Wisconsin - Madison;University of Wisconsin - Madison;Yahoo! Research;IBM Almaden Research Center
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 24
Cited 16

Incomplete Information in Relational Databases

Journal of the ACM (JACM)
On the integrity of databases with incomplete information

PODS '86 Proceedings of the fifth ACM SIGACT-SIGMOD symposium on Principles of database systems
Direct transitive closure algorithms: design and performance evaluation

ACM Transactions on Database Systems (TODS)
A performance study of transitive closure algorithms

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
OLAP dimension constraints

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Aggregate Queries Over Conditional Tables

Journal of Intelligent Information Systems
Foundations of Aggregation Constraints

PPCP '94 Proceedings of the Second International Workshop on Principles and Practice of Constraint Programming
Summarizability in OLAP and Statistical Data Bases

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Scalar aggregation in inconsistent databases

Theoretical Computer Science - Database theory
Multidimensional databases: problems and solutions

Multidimensional databases: problems and solutions
Reasoning about Uncertainty

Reasoning about Uncertainty
Computing consistent query answers using conflict hypergraphs

Proceedings of the thirteenth ACM international conference on Information and knowledge management
OLAP over uncertain and imprecise data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Clean Answers over Dirty Databases: A Probabilistic Approach

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Consistent query answering in databases

ACM SIGMOD Record
Efficient allocation algorithms for OLAP over imprecise data

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Consistent query answers on numerical databases under aggregate constraints

DBPL'05 Proceedings of the 10th international conference on Database Programming Languages

MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Sampling cube: a framework for statistical olap over sampling data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Supporting OLAP operations over imperfectly integrated taxonomies

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Query answering techniques on uncertain and probabilistic data: tutorial summary

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Systems aspects of probabilistic data management

Proceedings of the VLDB Endowment
Information Extraction

Foundations and Trends in Databases
Privacy-Preserving Data Publishing

Foundations and Trends in Databases
Extended aggregations for databases with referential integrity issues

Data & Knowledge Engineering
Efficiently computing and querying multidimensional OLAP data cubes over probabilistic relational data

ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
Multidimensional cyclic graph approach: Representing a data cube without common sub-graphs

Information Sciences: an International Journal
Graph cube: on warehousing and OLAP multidimensional networks

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Can the Utility of Anonymized Data be Used for Privacy Breaches?

ACM Transactions on Knowledge Discovery from Data (TKDD)
DuoWave: Mitigating the curse of dimensionality for uncertain data

Data & Knowledge Engineering
Aggregate queries on probabilistic record linkages

Proceedings of the 15th International Conference on Extending Database Technology
HMGraph OLAP: a novel framework for multi-dimensional heterogeneous network analysis

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
OLAPing social media: the case of Twitter

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several recent papers have focused on OLAP over imprecise data, where each fact can be a region, instead of a point, in a multi-dimensional space. They have provided a multiple-world semantics for such data, and developed efficient ways to answer OLAP aggregation queries over the imprecise facts. These solutions, however, assume that the imprecise facts can be interpreted independently of one another, a key assumption that is often violated in practice. Indeed, imprecise facts in real-world applications are often correlated, and such correlations can be captured as domain integrity constraints (e.g., repairs with the same customer names and models took place in the same city, or a text span can refer to a person or a city, but not both). In this paper we provide a framework for answering OLAP aggregation queries over imprecise data in the presence of such domain constraints. We first describe a relatively simple yet powerful constraint language, and formalize what it means to take into account such constraints in query answering. Next, we prove that OLAP queries can be answered efficiently given a database D* of fact marginals. We then exploit the regularities in the constraint space (captured in a constraint hypergraph) and the fact space to efficiently construct D*. We present extensive experiments over real-world and synthetic data to demonstrate the effectiveness of our approach.