A method for automatic rule derivation to support semantic query optimization
ACM Transactions on Database Systems (TODS)
Mining quantitative association rules in large relational tables
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Learning belief networks from data: an information theory based approach
CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Self-tuning histograms: building histograms without looking at data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Exploiting statistics on query expressions for optimization
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
LEO - DB2's LEarning Optimizer
Proceedings of the 27th International Conference on Very Large Data Bases
SASH: a self-adaptive histogram set for dynamically changing workloads
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
BHUNT: automatic discovery of Fuzzy algebraic constraints in relational data
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Maintaining Implicated Statistics in Constrained Environments
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
MYSTIQ: a system for finding more answers by using probabilities
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Consistently estimating the selectivity of conjuncts of predicates
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Content-based routing: different plans for different data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Answering queries from statistics and probabilistic views
VLDB '05 Proceedings of the 31st international conference on Very large data bases
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases
IEEE Transactions on Knowledge and Data Engineering
From HTML documents to web tables and rules
ICEC '06 Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet
A dip in the reservoir: maintaining sample synopses of evolving datasets
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Consistent selectivity estimation via maximum entropy
The VLDB Journal — The International Journal on Very Large Data Bases
ACM Transactions on Database Systems (TODS)
Cardinality estimation using sample views with quality assurance
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Management of probabilistic data: foundations and challenges
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Partition search for non-binary constraint satisfaction
Information Sciences: an International Journal
Automated statistics collection in DB2 UDB
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
CORDS: automatic generation of correlation statistics in DB2
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Detecting attribute dependencies from query feedback
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Probabilistic graphical models and their role in databases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Categorical skylines for streaming data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Query evaluation with soft-key constraints
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using error-correcting dependencies for collaborative filtering
Data & Knowledge Engineering
Volatile correlation computation: a checkpoint view
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
HLS: Tunable Mining of Approximate Functional Dependencies
BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
On generating near-optimal tableaux for conditional functional dependencies
Proceedings of the VLDB Endowment
Dynamic faceted search for discovery-driven analysis
Proceedings of the 17th ACM conference on Information and knowledge management
The Harmony Integration Workbench
Journal on Data Semantics XI
Sample synopses for approximate answering of group-by queries
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Top-K Correlation Sub-graph Search in Graph Databases
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Troubleshooting chronic conditions in large IP networks
CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
Estimating the confidence of conditional functional dependencies
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Efficient discovery of join plans in schemaless data
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Depth first algorithms and inferencing for AFD mining
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
The VLDB Journal — The International Journal on Very Large Data Bases
StatAdvisor: recommending statistical views
Proceedings of the VLDB Endowment
Correlation maps: a compressed access method for exploiting soft functional dependencies
Proceedings of the VLDB Endowment
Keyword search for data-centric XML collections with long text fields
Proceedings of the 13th International Conference on Extending Database Technology
Communications of the ACM
Measuring independence of datasets
Proceedings of the forty-second ACM symposium on Theory of computing
Supporting ranking queries on uncertain and incomplete data
The VLDB Journal — The International Journal on Very Large Data Bases
Scaling up top-K cosine similarity search
Data & Knowledge Engineering
CORADD: correlation aware database designer for materialized views and indexes
Proceedings of the VLDB Endowment
G-RCA: a generic root cause analysis platform for service quality management in large IP networks
Proceedings of the 6th International COnference
Using structural information in XML keyword search effectively
ACM Transactions on Database Systems (TODS)
Predicting cost amortization for query services
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Differential dependencies: Reasoning and discovery
ACM Transactions on Database Systems (TODS)
A call to arms: revisiting database design
ACM SIGMOD Record
Beauty and the beast: the theory and practice of information integration
ICDT'07 Proceedings of the 11th international conference on Database Theory
Self-adaptive statistics management for efficient query processing
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Integrating a maximum-entropy cardinality estimator into DB2 UDB
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Design by example for SQL table definitions with functional dependencies
The VLDB Journal — The International Journal on Very Large Data Bases
Toward automated large-scale information integration and discovery
Data Management in a Connected World
ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
Ontology guided data linkage framework for discovering meaningful data facts
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
SMARTINT: using mined attribute dependencies to integrate fragmented web databases
Journal of Intelligent Information Systems
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Proceedings of the 21st ACM international conference on Information and knowledge management
G-RCA: a generic root cause analysis platform for service quality management in large IP networks
IEEE/ACM Transactions on Networking (TON)
Efficiently adapting graphical models for selectivity estimation
The VLDB Journal — The International Journal on Very Large Data Bases
Pragmatic correlation analysis for probabilistic ranking over relational data
Expert Systems with Applications: An International Journal
Comparable dependencies over heterogeneous data
The VLDB Journal — The International Journal on Very Large Data Bases
Issues in big data testing and benchmarking
Proceedings of the Sixth International Workshop on Testing Database Systems
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
UpSizeR: Synthetically scaling an empirical relational database
Information Systems
Data & Knowledge Engineering
Editorial: Efficient discovery of similarity constraints for matching dependencies
Data & Knowledge Engineering
Discovering denial constraints
Proceedings of the VLDB Endowment
ACM SIGMOD Record
Hi-index | 0.02 |
The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but can also cause query optimizers---which usually assume that columns are statistically independent---to underestimate the selectivities of conjunctive predicates by orders of magnitude. We introduce CORDS, an efficient and scalable tool for automatic discovery of correlations and soft functional dependencies between columns. CORDS searches for column pairs that might have interesting and useful dependency relations by systematically enumerating candidate pairs and simultaneously pruning unpromising candidates using a flexible set of heuristics. A robust chi-squared analysis is applied to a sample of column values in order to identify correlations, and the number of distinct values in the sampled columns is analyzed to detect soft functional dependencies. CORDS can be used as a data mining tool, producing dependency graphs that are of intrinsic interest. We focus primarily on the use of CORDS in query optimization. Specifically, CORDS recommends groups of columns on which to maintain certain simple joint statistics. These "column-group" statistics are then used by the optimizer to avoid naive selectivity estimates based on inappropriate independence assumptions. This approach, because of its simplicity and judicious use of sampling, is relatively easy to implement in existing commercial systems, has very low overhead, and scales well to the large numbers of columns and large table sizes found in real-world databases. Experiments with a prototype implementation show that the use of CORDS in query optimization can speed up query execution times by an order of magnitude. CORDS can be used in tandem with query feedback systems such as the LEO learning optimizer, leveraging the infrastructure of such systems to correct bad selectivity estimates and ameliorating the poor performance of feedback systems during slow learning phases.