SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Integrating information by outerjoins and full disjunctions (extended abstract)
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Outerjoin simplification and reordering for query optimization
ACM Transactions on Database Systems (TODS)
Algorithm 457: finding all cliques of an undirected graph
Communications of the ACM
Database Systems: The Complete Book
Database Systems: The Complete Book
Data Exchange: Semantics and Query Answering
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Object Fusion in Mediator Systems
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Integrating and Managing Conflicting Data
PSI '02 Revised Papers from the 4th International Andrei Ershov Memorial Conference on Perspectives of System Informatics: Akademgorodok, Novosibirsk, Russia
Adaptive algorithms for set containment joins
ACM Transactions on Database Systems (TODS)
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Conflict Tolerant Queries in AURORA
COOPIS '99 Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems
Mapping XML and Relational Schemas with Clio
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Finding All Maximal Cliques in Dynamic Graphs
Computational Optimization and Applications
Efficient similarity-based operations for data integration
Data & Knowledge Engineering
Canonical abstraction for outerjoin optimization
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Utility-based resolution of data inconsistencies
Proceedings of the 2004 international workshop on Information quality in information systems
An incremental algorithm for computing ranked full disjunctions
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ConQuer: efficient management of inconsistent databases
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
View matching for outer-join views
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Full disjunctions: polynomial-delay iterators in action
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
ACM Computing Surveys (CSUR)
Large maximal cliques enumeration in sparse graphs
Proceedings of the 17th ACM conference on Information and knowledge management
Declarative data fusion – syntax, semantics, and implementation
ADBIS'05 Proceedings of the 9th East European conference on Advances in Databases and Information Systems
Wondering why data are missing from query results?: ask conseil why-not
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
The goal of data fusion is to combine several representations of one real world object into a single, consistent representation, e.g., in data integration. A very popular operator to perform data fusion is the minimum union operator. It is defined as the outer union and the subsequent removal of subsumed tuples. Minimum union is used in other applications as well, for instance in database query optimization to rewrite outer join queries, in the semantic web community in implementing Sparql's optional operator, etc. Despite its wide applicability, there are only few efficient implementations, and until now, minimum union is not a relational database primitive. This paper fills this gap as we present implementations of subsumption that serve as a building block for minimum union. Furthermore, we consider this operator as database primitive and show how to perform optimization of query plans in presence of subsumption and minimum union through rule-based plan transformations. Experiments on both artificial and real world data show that our algorithms outperform existing algorithms used for subsumption in terms of runtime and they scale to large volumes of data. In the context of data integration, we observe that performing data fusion calls for more than subsumption and minimum union. Therefore, another contribution of this paper is the definition of the complementation and complement union operators. Intuitively, these allow to merge tuples that have complementing values and thus eliminate unnecessary null-values.