A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
On the complexity of database queries
Journal of Computer and System Sciences
Executing SQL over encrypted data in the database-service-provider model
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Capturing collection size for distributed non-cooperative retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating corpus size via queries
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A random walk approach to sampling hidden databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Future Generation Computer Systems
Privacy preservation of aggregates in hidden databases: why and how?
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Unbiased estimation of size and other aggregates over hidden web databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
Many techniques have been proposed to database size estimation. However, the emergency of cloud computing introduces new opportunities along with new challenges. In cloud, a monitoring proxy can be set up by service provider due to the ownership of cloud infrastructure. The collected data allows for service provider to estimate the size of database which may be a black-box to them. We claim that the relationship between query performance and data size can be captured by a complexity function. One can leverage such function to estimate table size if given query execution time. In this paper, we propose a fine grained framework called Database Size Estimation based on Complexity (DSEC) to estimate the size of databases from the perspective of service provider. In particular, we argue that only a small fraction of tables impact service performance significantly, which are referred to as "important tables". We illustrate "important table" locating process on three typical benchmarks: RUBiS, RUBBoS and TPC-W. Finally, we describe extensive experiments on TPC-W (the most challenging one) to evaluate the effectiveness and efficiency of DSEC in various scenarios.