Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Approximate inference of functional dependencies from relations
ICDT '92 Selected papers of the fourth international conference on Database theory
Theoretical Computer Science - Special issue: principles and practice of constraint programming
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Optimization of constrained frequent set queries with 2-variable constraints
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Generating non-redundant association rules
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Arktos: towards the modeling, design, control and execution of ETL processes
Information Systems - Data extraction, cleaning and reconciliation
Mining database structure; or, how to build a data quality browser
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Constrained frequent pattern mining: a pattern-growth view
ACM SIGKDD Explorations Newsletter
Finding Interesting Associations without Support Pruning
IEEE Transactions on Knowledge and Data Engineering
Efficient Discovery of Functional and Approximate Dependencies Using Partitions
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Semantic Compression and Pattern Extraction with Fascicles
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Declarative Data Cleaning: Language, Model, and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ItCompress: An Iterative Semantic Compression Algorithm
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Discovery of multivalued dependencies from relations
Intelligent Data Analysis
Extending dependencies with conditions
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Improving data quality: consistency and accuracy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Incorporating cardinality constraints and synonym rules into conditional functional dependencies
Information Processing Letters
Estimating the confidence of conditional functional dependencies
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
ICFCA '09 Proceedings of the 7th International Conference on Formal Concept Analysis
Conditional Dependencies: A Principled Approach to Improving Data Quality
BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Analyses and Validation of Conditional Dependencies with Built-in Predicates
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Discovering matching dependencies
Proceedings of the 18th ACM conference on Information and knowledge management
Automatic accuracy assessment via hashing in multiple-source environment
Expert Systems with Applications: An International Journal
Proceedings of the VLDB Endowment
Missing data imputation: a fuzzy K-means clustering algorithm over sliding window
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 3
Dependency discovery in data quality
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
Towards certain fixes with editing rules and master data
Proceedings of the VLDB Endowment
Interaction between record matching and data repairing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Differential dependencies: Reasoning and discovery
ACM Transactions on Database Systems (TODS)
Functional dependency discovery via Bayes net analysis
MAMECTIS/NOLASC/CONTROL/WAMUS'11 Proceedings of the 13th WSEAS international conference on mathematical methods, computational techniques and intelligent systems, and 10th WSEAS international conference on non-linear analysis, non-linear systems and chaos, and 7th WSEAS international conference on dynamical systems and control, and 11th WSEAS international conference on Wavelet analysis and multirate systems: recent researches in computational techniques, non-linear systems and control
Improving data quality by source analysis
Journal of Data and Information Quality (JDIQ)
Towards certain fixes with editing rules and master data
The VLDB Journal — The International Journal on Very Large Data Bases
Using functional dependencies for reducing the size of a data cube
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Towards a catalog of spreadsheet smells
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
Improving the Data Quality of Drug Databases using Conditional Dependencies and Ontologies
Journal of Data and Information Quality (JDIQ)
Discovering conditional inclusion dependencies
Proceedings of the 21st ACM international conference on Information and knowledge management
Comparable dependencies over heterogeneous data
The VLDB Journal — The International Journal on Very Large Data Bases
Discovering conditional functional dependencies in XML data
ADC '11 Proceedings of the Twenty-Second Australasian Database Conference - Volume 115
Determining the relative accuracy of attributes
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Editorial: Efficient discovery of similarity constraints for matching dependencies
Data & Knowledge Engineering
Discovering denial constraints
Proceedings of the VLDB Endowment
Extending inclusion dependencies with conditions
Theoretical Computer Science
ACM SIGMOD Record
Hi-index | 0.00 |
Dirty data is a serious problem for businesses leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. Dirty data often arises when domain constraints and business rules, meant to preserve data consistency and accuracy, are enforced incompletely or not at all in application code. In this work, we propose a new data-driven tool that can be used within an organization's data quality management process to suggest possible rules, and to identify conformant and non-conformant records. Data quality rules are known to be contextual, so we focus on the discovery of context-dependent rules. Specifically, we search for conditional functional dependencies (CFDs), that is, functional dependencies that hold only over a portion of the data. The output of our tool is a set of functional dependencies together with the context in which they hold (for example, a rule that states for CS graduate courses, the course number and term functionally determines the room and instructor). Since the input to our tool will likely be a dirty database, we also search for CFDs that almost hold. We return these rules together with the non-conformant records (as these are potentially dirty records). We present effective algorithms for discovering CFDs and dirty values in a data instance. Our discovery algorithm searches for minimal CFDs among the data values and prunes redundant candidates. No universal objective measures of data quality or data quality rules are known. Hence, to avoid returning an unnecessarily large number of CFDs and only those that are most interesting, we evaluate a set of interest metrics and present comparative results using real datasets. We also present an experimental study showing the scalability of our techniques.