Extended aggregations for databases with referential integrity issues

Authors:
Javier García-García;Carlos Ordonez
Affiliations:
Universidad Nacional Autónoma de México, Facultad de Ciencias, UNAM, Mexico City, CU 04510, Mexico;University of Houston, Department of Computer Science, Houston, TX 77204, USA
Venue:
Data & Knowledge Engineering
Year:
2010

Citing 30
Cited 2

The relational model for database management: version 2

The relational model for database management: version 2
A probabilistic relational algebra for the integration of information retrieval and database systems

ACM Transactions on Information Systems (TOIS)
Query evaluation in probabilistic relational databases

Selected papers from the international workshop on Uncertainty in databases and deductive systems
ProbView: a flexible probabilistic database system

ACM Transactions on Database Systems (TODS)
Consistent query answers in inconsistent databases

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fundamentals of Database Systems

Fundamentals of Database Systems
Mining database structure; or, how to build a data quality browser

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Evaluating Aggregate Operations Over Imprecise Data

IEEE Transactions on Knowledge and Data Engineering
Aggregation of Imprecise and Uncertain Information in Databases

IEEE Transactions on Knowledge and Data Engineering
Involving Aggregate Functions in Multi-relational Search

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Summarizability in OLAP and Statistical Data Bases

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
On the decidability and complexity of query answering over inconsistent and incomplete databases

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Scalar aggregation in inconsistent databases

Theoretical Computer Science - Database theory
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
OLAP Databases and Aggregation Functions

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
A Logical Framework for Querying and Repairing Inconsistent Databases

IEEE Transactions on Knowledge and Data Engineering
Data integration under integrity constraints

Information Systems - Special issue: The 14th international conference on advanced information systems engineering (CAiSE*02)
Computing consistent query answers using conflict hypergraphs

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Aggregate operators in probabilistic databases

Journal of the ACM (JACM)
ConQuer: efficient management of inconsistent databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
OLAP over uncertain and imprecise data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Database repairing using updates

ACM Transactions on Database Systems (TODS)
Consistent query answering under key and exclusion dependencies: algorithms and experiments

Proceedings of the 14th ACM international conference on Information and knowledge management
OLAP over uncertain and imprecise data

The VLDB Journal — The International Journal on Very Large Data Bases
The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming and Delivering Data

The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming and Delivering Data
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Measuring referential integrity in distributed databases

Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience
Referential integrity quality metrics

Decision Support Systems
OLAP over imprecise data with domain constraints

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Estimating and bounding aggregations in databases with referential integrity errors

Proceedings of the ACM 11th international workshop on Data warehousing and OLAP

Repairing OLAP queries in databases with referential integrity errors

DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
Detecting summarizability in OLAP

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Querying inconsistent databases remains a broad and difficult problem. In this work, we study how to improve aggregations computed on databases with referential errors in the context of database integration, where each source database has different tables, columns with similar content across multiple databases, but different referential integrity constraints. Thus, a query in an integrated database may involve tables and columns with referential integrity errors. In a data warehouse, even though the ETL processes fix referential integrity errors, this is generally done by inserting ''dummy'' records into the dimension tables corresponding to such invalid foreign keys, thereby artificially enforcing referential integrity. When two tables are joined and aggregations are computed, rows with an invalid or null foreign key value are skipped, effectively eliminating potentially valuable information. With that motivation in mind, we extend SQL aggregate functions computed over tables with referential integrity issues to return complete answer sets in the sense that no row is excluded. We associate to each referenced key in the dimension table, a probability that invalid or null foreign keys refer to it. Our main idea is to compute aggregations over joined tables including rows with invalid or null references by distributing their contribution to aggregation totals, based on probabilities computed over correct foreign keys. Experiments with real and synthetic databases evaluate the usefulness, accuracy and performance of our extended aggregations.