Automatic physical database tuning middleware for web-based applications

  • Authors:
  • Jozsef Patvarczki;Neil T. Heffernan

  • Affiliations:
  • Worcester Polytechnic Institute, Computer Science Department, Worcester, Massachusetts;Worcester Polytechnic Institute, Computer Science Department, Worcester, Massachusetts

  • Venue:
  • ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we conceptualize the database layout problem as a state space search problem. A state is a given assignment of tables to computer servers. We begin with a database and collect, for use as a workload input, a sequence of queries that were executed during normal usage of the database. The operators in the search are to fully replicate, horizontally partition, vertically partition, and de-normalize a table. We do a time intensive search over different table layouts, and at each iteration, physically create the configurations, and evaluate the total throughput of the system. We report our empirical results of two forms. First, we empirically validate as facts the heuristics that Database Administrators (DBAs) currently use as in doing this task manually: for tables that have a high ratio of update, delete, and insert to retrieval queries one should horizontally partition, but for a small ratio one should fully replicate a table. Such rules of thumb are reasonable, however we want to parameterize some common guidelines that DBAs can use. Our second empirical result is that we applied this search to our existing data test case and found a reliable increase in total system throughput. The search over layouts is very expensive, but we argue that our method is practical and useful, as entities trying to scale up their Web-based applications would be perfectly happy to spend a few weeks of CPU time to increase their system throughput (and potentially reduce the investment in hardware). To make this search more practical, we want to learn reasonable rules to guide the search to eliminate many layout configurations that are not very likely to succeed. The second aspect of our project (not reported here) is to use the created configurations as input into a machine learning system, to create general rules about when to use the different layout operators.