Subsumption and complementation as data fusion operators

Authors:
Jens Bleiholder;Sascha Szott;Melanie Herschel;Frank Kaufer;Felix Naumann
Affiliations:
Hasso-Plattner-Institut, Potsdam, Germany;Konrad-Zuse-Zentrum, Berlin, Germany;Universität Tübingen, Tübingen, Germany;Hasso-Plattner-Institut, Potsdam, Germany;Hasso-Plattner-Institut, Potsdam, Germany
Venue:
Proceedings of the 13th International Conference on Extending Database Technology
Year:
2010

Citing 25
Cited 1

Outerjoins as disjunctions

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Integrating information by outerjoins and full disjunctions (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Outerjoin simplification and reordering for query optimization

ACM Transactions on Database Systems (TODS)
Algorithm 457: finding all cliques of an undirected graph

Communications of the ACM
Database Systems: The Complete Book

Database Systems: The Complete Book
Data Exchange: Semantics and Query Answering

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Object Fusion in Mediator Systems

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Integrating and Managing Conflicting Data

PSI '02 Revised Papers from the 4th International Andrei Ershov Memorial Conference on Perspectives of System Informatics: Akademgorodok, Novosibirsk, Russia
Adaptive algorithms for set containment joins

ACM Transactions on Database Systems (TODS)
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Conflict Tolerant Queries in AURORA

COOPIS '99 Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems
Mapping XML and Relational Schemas with Clio

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Finding All Maximal Cliques in Dynamic Graphs

Computational Optimization and Applications
Efficient similarity-based operations for data integration

Data & Knowledge Engineering
Canonical abstraction for outerjoin optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Utility-based resolution of data inconsistencies

Proceedings of the 2004 international workshop on Information quality in information systems
An incremental algorithm for computing ranked full disjunctions

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ConQuer: efficient management of inconsistent databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
View matching for outer-join views

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Full disjunctions: polynomial-delay iterators in action

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Translating web data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Data fusion

ACM Computing Surveys (CSUR)
Large maximal cliques enumeration in sparse graphs

Proceedings of the 17th ACM conference on Information and knowledge management
Declarative data fusion – syntax, semantics, and implementation

ADBIS'05 Proceedings of the 9th East European conference on Advances in Databases and Information Systems

Wondering why data are missing from query results?: ask conseil why-not

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of data fusion is to combine several representations of one real world object into a single, consistent representation, e.g., in data integration. A very popular operator to perform data fusion is the minimum union operator. It is defined as the outer union and the subsequent removal of subsumed tuples. Minimum union is used in other applications as well, for instance in database query optimization to rewrite outer join queries, in the semantic web community in implementing Sparql's optional operator, etc. Despite its wide applicability, there are only few efficient implementations, and until now, minimum union is not a relational database primitive. This paper fills this gap as we present implementations of subsumption that serve as a building block for minimum union. Furthermore, we consider this operator as database primitive and show how to perform optimization of query plans in presence of subsumption and minimum union through rule-based plan transformations. Experiments on both artificial and real world data show that our algorithms outperform existing algorithms used for subsumption in terms of runtime and they scale to large volumes of data. In the context of data integration, we observe that performing data fusion calls for more than subsumption and minimum union. Therefore, another contribution of this paper is the definition of the complementation and complement union operators. Intuitively, these allow to merge tuples that have complementing values and thus eliminate unnecessary null-values.