Optimization of sub-query processing in distributed data integration systems

Authors:
Gang Chen;Yongwei Wu;Jia Liu;Guangwen Yang;Weimin Zheng
Affiliations:
Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
Venue:
Journal of Network and Computer Applications
Year:
2011

Citing 14
Cited 0

Buffer management in relational database systems

ACM Transactions on Database Systems (TODS)
On estimating the cardinality of the projection of a database relation

ACM Transactions on Database Systems (TODS)
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
Optimizing queries using materialized views: a practical, scalable solution

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Optimizing Queries Across Diverse Data Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Efficient query processing for data integration

Efficient query processing for data integration
CoDIMS-G: a data and program integration service for the grid

MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
QPipe: a simultaneously pipelined relational query engine

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data integration: the teenage years

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Request Window: an approach to improve throughput of RDBMS-based data integration system by utilizing data sharing across concurrent distributed queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Adaptive query processing

Foundations and Trends in Databases
The design and implementation of OGSA-DQP: A service-based distributed query processor

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data integration system (DIS) is becoming paramount when Cloud/Grid applications need to integrate and analyze data from geographically distributed data sources. DIS gathers data from multiple remote sources, integrates and analyzes the data to obtain a query result. As Clouds/Grids are distributed over wide-area networks, communication cost usually dominates overall query response time. Therefore we can expect that query performance can be improved by minimizing communication cost. In our method, DIS uses a data flow style query execution model. Each query plan is mapped to a group of @mEngines, each of which is a program corresponding to a particular operator. Thus, multiple sub-queries from concurrent queries are able to share @mEngines. We reconstruct these sub-queries to exploit overlapping data among them. As a result, all the sub-queries can obtain their results, and overall communication overhead can be reduced. Experimental results show that, when DIS runs a group of parameterized queries, our reconstructing algorithm can reduce the average query completion time by 32-48%; when DIS runs a group of non-parameterized queries, the average query completion time of queries can be reduced by 25-35%.