Discovering denial constraints

Authors:
Xu Chu;Ihab F. Ilyas;Paolo Papotti
Affiliations:
University of Waterloo;QCRI;QCRI
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 15
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Constraint-generating dependencies

Journal of Computer and System Sciences
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Mining database structure; or, how to build a data quality browser

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract

DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution

ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Conditional functional dependencies for capturing data inconsistencies

ACM Transactions on Database Systems (TODS)
On generating near-optimal tableaux for conditional functional dependencies

Proceedings of the VLDB Endowment
Discovering data quality rules

Proceedings of the VLDB Endowment
Discovering Conditional Functional Dependencies

IEEE Transactions on Knowledge and Data Engineering
Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data

Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data
Foundations of Data Quality Management

Foundations of Data Quality Management
Database Repairing and Consistent Query Answering

Database Repairing and Consistent Query Answering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Integrity constraints (ICs) provide a valuable tool for enforcing correct application semantics. However, designing ICs requires experts and time. Proposals for automatic discovery have been made for some formalisms, such as functional dependencies and their extension conditional functional dependencies. Unfortunately, these dependencies cannot express many common business rules. For example, an American citizen cannot have lower salary and higher tax rate than another citizen in the same state. In this paper, we tackle the challenges of discovering dependencies in a more expressive integrity constraint language, namely Denial Constraints (DCs). DCs are expressive enough to overcome the limits of previous languages and, at the same time, have enough structure to allow efficient discovery and application in several scenarios. We lay out theoretical and practical foundations for DCs, including a set of sound inference rules and a linear algorithm for implication testing. We then develop an efficient instance-driven DC discovery algorithm and propose a novel scoring function to rank DCs for user validation. Using real-world and synthetic datasets, we experimentally evaluate scalability and effectiveness of our solution.