Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Constraint-generating dependencies
Journal of Computer and System Sciences
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Mining database structure; or, how to build a data quality browser
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Conditional functional dependencies for capturing data inconsistencies
ACM Transactions on Database Systems (TODS)
On generating near-optimal tableaux for conditional functional dependencies
Proceedings of the VLDB Endowment
Discovering data quality rules
Proceedings of the VLDB Endowment
Discovering Conditional Functional Dependencies
IEEE Transactions on Knowledge and Data Engineering
Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data
Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data
Foundations of Data Quality Management
Foundations of Data Quality Management
Database Repairing and Consistent Query Answering
Database Repairing and Consistent Query Answering
Hi-index | 0.00 |
Integrity constraints (ICs) provide a valuable tool for enforcing correct application semantics. However, designing ICs requires experts and time. Proposals for automatic discovery have been made for some formalisms, such as functional dependencies and their extension conditional functional dependencies. Unfortunately, these dependencies cannot express many common business rules. For example, an American citizen cannot have lower salary and higher tax rate than another citizen in the same state. In this paper, we tackle the challenges of discovering dependencies in a more expressive integrity constraint language, namely Denial Constraints (DCs). DCs are expressive enough to overcome the limits of previous languages and, at the same time, have enough structure to allow efficient discovery and application in several scenarios. We lay out theoretical and practical foundations for DCs, including a set of sound inference rules and a linear algorithm for implication testing. We then develop an efficient instance-driven DC discovery algorithm and propose a novel scoring function to rank DCs for user validation. Using real-world and synthetic datasets, we experimentally evaluate scalability and effectiveness of our solution.