Estimating and bounding aggregations in databases with referential integrity errors

Authors:
Javier García-García;Carlos Ordonez
Affiliations:
Universidad Nacional Autónoma de México, Mexico City, Mexico;University of Houston, Houston, TX, USA
Venue:
Proceedings of the ACM 11th international workshop on Data warehousing and OLAP
Year:
2008

Citing 13
Cited 1

Evaluating Aggregate Operations Over Imprecise Data

IEEE Transactions on Knowledge and Data Engineering
Aggregation of Imprecise and Uncertain Information in Databases

IEEE Transactions on Knowledge and Data Engineering
Involving Aggregate Functions in Multi-relational Search

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Summarizability in OLAP and Statistical Data Bases

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
On the decidability and complexity of query answering over inconsistent and incomplete databases

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Scalar aggregation in inconsistent databases

Theoretical Computer Science - Database theory
OLAP Databases and Aggregation Functions

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
A Logical Framework for Querying and Repairing Inconsistent Databases

IEEE Transactions on Knowledge and Data Engineering
Aggregate operators in probabilistic databases

Journal of the ACM (JACM)
OLAP over uncertain and imprecise data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Database repairing using updates

ACM Transactions on Database Systems (TODS)
Measuring referential integrity in distributed databases

Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience
Referential integrity quality metrics

Decision Support Systems

Extended aggregations for databases with referential integrity issues

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database integration builds on tables coming from multiple databases by creating a single view of all these data. Each database has different tables, columns with similar content across databases and different referential integrity constraints. Thus, a query in an integrated database is likely to involve tables and columns with referential integrity errors. In a data warehouse environment, even though the ETL processes take care of the referential integrity errors, in many scenarios this is generally done by including 'dummy' records in the dimension tables used to relate to the fact tables with referential errors. When two tables are joined, and aggregations are computed, the tuples with an undefined foreign key value are aggregated in a group marked as undefined effectively discarding potentially valuable information. With that motivation in mind, we extend aggregate functions computed over tables with referential integrity errors on OLAP databases to return complete answer sets in the sense that no tuple is excluded. We associate to each valid reference, the probability that an invalid reference may actually be a certain correct reference. The main idea of our work is that in certain contexts, it is possible to use tuples with invalid references by taking into account the probability that an invalid reference actually be a certain correct reference. This way, improved answer sets are obtained from aggregate queries in settings where a database violates referential integrity constraints.