Integrating domain heterogeneous data sources using decomposition aggregation queries

  • Authors:
  • Jian Xu;Rachel Pottinger

  • Affiliations:
  • -;-

  • Venue:
  • Information Systems
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

The decomposition aggregation query (DAQ) we introduce in this paper extends semantic integration queries by allowing query translation to create aggregate queries based on the DAQ's novel three role structure. We describe the application of DAQs in integrating domain heterogeneous data sources, the new semantics of DAQ answers and the query translation algorithm called ''aggregation rewriting''. A central problem of optimizing DAQ processing requires determining the data sources towards which the DAQ is translated. Our source selection algorithm has cover-finding and partitioning steps which are optimized to 1. lower the processing overhead while speeding up query answering and 2. eliminate duplicates with minimal overhead. We establish connections between source selection optimizations and classic NP-hard optimizations and resolve the optimization problems with efficient solvers. We empirically study both the DAQ query translation and the source selection algorithms using real-world and synthetic data sets; the results show satisfying scalability both in size of aggregations and data sources for the query translation algorithms and the source selection algorithms save a good amount of computational resources.