Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Query caching and optimization in distributed mediator systems
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Solving Local Cost Estimation Problem for Global Query Optimization in Multidatabase Systems
Distributed and Parallel Databases
Building regression cost models for multidatabase systems
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
A Query Sampling Method of Estimating Local Cost Parameters in a Multidatabase System
Proceedings of the Tenth International Conference on Data Engineering
Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Query Optimization in a Heterogeneous DBMS
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Data Reduction and Noise Filtering for Predicting Times Series
WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
Learning response time for WebSources using query feedback and application in query optimization
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Query processing over the Internet involving autonomous data sources is a major task in data integration. It requires the estimated costs of possible queries in order to select the best one that has the minimum cost. In this context, the cost of a query is affected by three factors: network congestion, server contention state, and complexity of the query. In this paper, we study the effects of both the network congestion and server contention state on the cost of a query. We refer to these two factors together as system contention states. We present a new approach to determining the system contention states by clustering the costs of a sample query. For each system contention state, we construct two cost formulas for unary and join queries respectively using the multiple regression process. When a new query is submitted, its system contention state is estimated first using either the time slides method or the statistical method. The cost of the query is then calculated using the corresponding cost formulas. The estimated cost of the query is further adjusted to improve its accuracy. Our experiments show that our methods can produce quite accurate cost estimates of the submitted queries to remote data sources over the Internet.