Database Size Estimation by Query Performance -- A Complexity Aspect

Authors:
Ye Zhou;Chi-Hung Chi
Affiliations:
-;-
Venue:
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Year:
2012

Citing 10
Cited 0

A technique for measuring the relative size and overlap of public Web search engines

WWW7 Proceedings of the seventh international conference on World Wide Web 7
On the complexity of database queries

Journal of Computer and System Sciences
Executing SQL over encrypted data in the database-service-provider model

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Capturing collection size for distributed non-cooperative retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating corpus size via queries

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A random walk approach to sampling hidden databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Building a database on S3

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility

Future Generation Computer Systems
Privacy preservation of aggregates in hidden databases: why and how?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Unbiased estimation of size and other aggregates over hidden web databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many techniques have been proposed to database size estimation. However, the emergency of cloud computing introduces new opportunities along with new challenges. In cloud, a monitoring proxy can be set up by service provider due to the ownership of cloud infrastructure. The collected data allows for service provider to estimate the size of database which may be a black-box to them. We claim that the relationship between query performance and data size can be captured by a complexity function. One can leverage such function to estimate table size if given query execution time. In this paper, we propose a fine grained framework called Database Size Estimation based on Complexity (DSEC) to estimate the size of databases from the perspective of service provider. In particular, we argue that only a small fraction of tables impact service performance significantly, which are referred to as "important tables". We illustrate "important table" locating process on three typical benchmarks: RUBiS, RUBBoS and TPC-W. Finally, we describe extensive experiments on TPC-W (the most challenging one) to evaluate the effectiveness and efficiency of DSEC in various scenarios.