HLS: Tunable Mining of Approximate Functional Dependencies
BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
Consistent Query Answering: The First Ten Years
SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
Estimating the confidence of conditional functional dependencies
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
ICFCA '09 Proceedings of the 7th International Conference on Formal Concept Analysis
Conditional Dependencies: A Principled Approach to Improving Data Quality
BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Analyses and Validation of Conditional Dependencies with Built-in Predicates
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Proceedings of the VLDB Endowment
Consistent query answers from virtually integrated XML data
Journal of Systems and Software
Proceedings of the VLDB Endowment
Differential dependencies: Reasoning and discovery
ACM Transactions on Database Systems (TODS)
Improving data quality by source analysis
Journal of Data and Information Quality (JDIQ)
Proceedings of the 15th International Conference on Database Theory
Improving the Data Quality of Drug Databases using Conditional Dependencies and Ontologies
Journal of Data and Information Quality (JDIQ)
Inconsistency-Induced Learning for Perpetual Learners
International Journal of Software Science and Computational Intelligence
Comparable dependencies over heterogeneous data
The VLDB Journal — The International Journal on Very Large Data Bases
Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Editorial: Efficient discovery of similarity constraints for matching dependencies
Data & Knowledge Engineering
ACM Transactions on Database Systems (TODS) - Invited papers issue
Hi-index | 0.00 |
This paper investigates the discovery of conditional functional dependencies (CFDs). CFDs are a recent extension of functional dependencies (FDs) by supporting patterns of semantically related constants, and can be used as rules for cleaning relational data. However, finding CFDs is an expensive process that involves intensive manual effort. To effectively identify data cleaning rules, we develop techniques for discovering CFDs from sample relations. We provide three methods for CFD discovery. The first, referred to as CFDMiner, is based on techniques for mining closed itemsets, and is used to discover constant CFDs, namely, CFDs with constant patterns only. The other two algorithms are developed for discovering general CFDs. The first algorithm, referred to as CTANE, is a levelwise algorithm that extends TANE, a well-known algorithm for mining FDs. The other, referred to as FastCFD, is based on the depthfirst approach used in FastFD, a method for discovering FDs. It leverages closed-itemset mining to reduce search space. Our experimental results demonstrate the following. (a) CFDMiner can be multiple orders of magnitude faster than CTANE and FastCFD for constant CFD discovery. (b) CTANE works well when a given sample relation is large, but it does not scale well with the arity of the relation. (c) FastCFD is far more efficient than CTANE when the arity of the relation is large.