SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Snowball: Scalable Storage on Networks of Workstations with Balanced Load
Distributed and Parallel Databases
File Assignment in Parallel I/O Systems with Minimal Variance of Service Time
IEEE Transactions on Computers
Towards self-tuning data placement in parallel database systems
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
Multiway Cuts in Directed and Node Weighted Graphs
ICALP '94 Proceedings of the 21st International Colloquium on Automata, Languages and Programming
Data partitioning and load balancing in parallel disk systems
The VLDB Journal — The International Journal on Very Large Data Bases
An object placement advisor for DB2 using solid state storage
Proceedings of the VLDB Endowment
Workload-aware storage layout for database systems
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
We present a formal analysis of the database layout problem, i.e., the problem of determining how database objects such as tables and indexes are assigned to disk drives. Optimizing this layout has a direct impact on the I/O performance of the entire system. The traditional approach of striping each object across all available disk drives is aimed at optimizing I/O parallelism; however, it is suboptimal when queries co-access two or more database objects, e.g., during a merge join of two tables, due to the increase in random disk seeks. We adopt an existing model, which takes into account both the benefit of I/O parallelism and the overhead due to random disk accesses, in the context of a query workload which includes co-access of database objects. The resulting optimization problem is intractable in general and we employ techniques from approximation algorithms to present provable performance guarantees. We show that while optimally exploiting I/O parallelism alone suggests uniformly striping data objects (even for heterogeneous files and disks), optimizing random disk access alone would assign each data object to a single disk drive. This confirms the intuition that the two effects are in tension with each other. We provide approximation algorithms in an attempt to optimize the trade-off between the two effects. We show that our algorithm achieves the best possible approximation ratio.