Automating physical database design in a parallel database
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Automated Selection of Materialized Views and Indexes in SQL Databases
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
AutoPart: Automating Schema Design for Large Scientific Databases Using Data Partitioning
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Ganymed: scalable replication for transactional web applications
Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
GlobeDB: autonomic data replication for web applications
WWW '05 Proceedings of the 14th international conference on World Wide Web
Globetp: template-based database replication for scalable web applications
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
DBFarm: a scalable cluster for multiple databases
Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Data warehouse technology by infobright
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Performance Driven Database Design for Scalable Web Applications
ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In this paper we conceptualize the database layout problem as a state space search problem. A state is a given assignment of tables to computer servers. We begin with a database and collect, for use as a workload input, a sequence of queries that were executed during normal usage of the database. The operators in the search are to fully replicate, horizontally partition, vertically partition, and de-normalize a table. We do a time intensive search over different table layouts, and at each iteration, physically create the configurations, and evaluate the total throughput of the system. We report our empirical results of two forms. First, we empirically validate as facts the heuristics that Database Administrators (DBAs) currently use as in doing this task manually: for tables that have a high ratio of update, delete, and insert to retrieval queries one should horizontally partition, but for a small ratio one should fully replicate a table. Such rules of thumb are reasonable, however we want to parameterize some common guidelines that DBAs can use. Our second empirical result is that we applied this search to our existing data test case and found a reliable increase in total system throughput. The search over layouts is very expensive, but we argue that our method is practical and useful, as entities trying to scale up their Web-based applications would be perfectly happy to spend a few weeks of CPU time to increase their system throughput (and potentially reduce the investment in hardware). To make this search more practical, we want to learn reasonable rules to guide the search to eliminate many layout configurations that are not very likely to succeed. The second aspect of our project (not reported here) is to use the created configurations as input into a machine learning system, to create general rules about when to use the different layout operators.