Logic-based approach to semantic query optimization
ACM Transactions on Database Systems (TODS)
Approximating clique is almost NP-complete (preliminary version)
SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
The design of relational databases
The design of relational databases
Algorithms for inferring functional dependencies from relations
Data & Knowledge Engineering
Greed is good: approximating independent sets in sparse and bounded-degree graphs
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Approximate inference of functional dependencies from relations
ICDT '92 Selected papers of the fourth international conference on Database theory
Semantic query optimization in Datalog programs (extended abstract)
PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Consistent query answers in inconsistent databases
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
The importance of being biased
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Semantic Query Optimization for Query Plans of Heterogeneous Multidatabase Systems
IEEE Transactions on Knowledge and Data Engineering
A Feasibility and Performance Study of Dependency Inference
Proceedings of the Fifth International Conference on Data Engineering
Efficient Discovery of Functional and Approximate Dependencies Using Partitions
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
On approximation measures for functional dependencies
Information Systems - Special issue: ADBIS 2002: Advances in databases and information systems
CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Semantic query optimization for XQuery over XML streams
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Principles of dataspace systems
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Database dependency discovery: a machine learning approach
AI Communications
Semantic optimization techniques for preference queries
Information Systems
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms
Theoretical Computer Science
iTrails: pay-as-you-go information integration in dataspaces
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pay-as-you-go user feedback for dataspace systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Bootstrapping pay-as-you-go data integration systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Dependencies revisited for improving data quality
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On generating near-optimal tableaux for conditional functional dependencies
Proceedings of the VLDB Endowment
Discovering data quality rules
Proceedings of the VLDB Endowment
Discovering Conditional Functional Dependencies
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Metric Functional Dependencies
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Estimating the confidence of conditional functional dependencies
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A better approximation ratio for the vertex cover problem
ACM Transactions on Algorithms (TALG)
Reasoning about record matching rules
Proceedings of the VLDB Endowment
Minimal-change integrity maintenance using tuple deletions
Information and Computation
Differential dependencies: Reasoning and discovery
ACM Transactions on Database Systems (TODS)
On data dependencies in dataspaces
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Materialization and Decomposition of Dataspaces for Efficient Search
IEEE Transactions on Knowledge and Data Engineering
Parameter-Free Determination of Distance Thresholds for Metric Distance Constraints
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Hi-index | 0.00 |
To study the data dependencies over heterogeneous data in dataspaces, we define a general dependency form, namely comparable dependencies (CDS), which specifies constraints on comparable attributes. It covers the semantics of a broad class of dependencies in databases, including functional dependencies (FDS), metric functional dependencies (MFDS), and matching dependencies (MDS). As we illustrated, comparable dependencies are useful in real practice of dataspaces, such as semantic query optimization. Due to heterogeneous data in dataspaces, the first question, known as the validation problem, is to tell whether a dependency (almost) holds in a data instance. Unfortunately, as we proved, the validation problem with certain error or confidence guarantee is generally hard. In fact, the confidence validation problem is also NP-hard to approximate to within any constant factor. Nevertheless, we develop several approaches for efficient approximation computation, such as greedy and randomized approaches with an approximation bound on the maximum number of violations that an object may introduce. Finally, through an extensive experimental evaluation on real data, we verify the superiority of our methods.