Buffer management in relational database systems
ACM Transactions on Database Systems (TODS)
On estimating the cardinality of the projection of a database relation
ACM Transactions on Database Systems (TODS)
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
Optimizing queries using materialized views: a practical, scalable solution
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Optimizing Queries Across Diverse Data Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Efficient query processing for data integration
Efficient query processing for data integration
CoDIMS-G: a data and program integration service for the grid
MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
QPipe: a simultaneously pipelined relational query engine
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data integration: the teenage years
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Foundations and Trends in Databases
The design and implementation of OGSA-DQP: A service-based distributed query processor
Future Generation Computer Systems
Hi-index | 0.00 |
Data integration system (DIS) is becoming paramount when Cloud/Grid applications need to integrate and analyze data from geographically distributed data sources. DIS gathers data from multiple remote sources, integrates and analyzes the data to obtain a query result. As Clouds/Grids are distributed over wide-area networks, communication cost usually dominates overall query response time. Therefore we can expect that query performance can be improved by minimizing communication cost. In our method, DIS uses a data flow style query execution model. Each query plan is mapped to a group of @mEngines, each of which is a program corresponding to a particular operator. Thus, multiple sub-queries from concurrent queries are able to share @mEngines. We reconstruct these sub-queries to exploit overlapping data among them. As a result, all the sub-queries can obtain their results, and overall communication overhead can be reduced. Experimental results show that, when DIS runs a group of parameterized queries, our reconstructing algorithm can reduce the average query completion time by 32-48%; when DIS runs a group of non-parameterized queries, the average query completion time of queries can be reduced by 25-35%.