SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
Distributed query processing in a relational data base system
SIGMOD '78 Proceedings of the 1978 ACM SIGMOD international conference on management of data
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Horizontal data partitioning in database design
SIGMOD '82 Proceedings of the 1982 ACM SIGMOD international conference on Management of data
The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the 2nd International Workshop on High Performance Transaction Systems
R* Optimizer Validation and Performance Evaluation for Distributed Queries
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases
Integrating vertical and horizontal partitioning into automated physical database design
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
H-store: a high-performance, distributed main memory transaction processing system
Proceedings of the VLDB Endowment
Wikipedia workload analysis for decentralized hosting
Computer Networks: The International Journal of Computer and Telecommunications Networking
The little engine(s) that could: scaling online social networks
Proceedings of the ACM SIGCOMM 2010 conference
Schism: a workload-driven approach to database replication and partitioning
Proceedings of the VLDB Endowment
Workload-aware database monitoring and consolidation
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Adapting microsoft SQL server for cloud computing
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Hi-index | 0.00 |
Partitioning data over multiple storage servers is an attractive way to increase throughput for web-like workloads. However, there is often no one partitioning that yields good performance for all queries, and it can be challenging for the web developer to determine how best to execute queries over partitioned data. This paper presents DIXIE, a SQL query planner, optimizer, and executor for databases horizontally partitioned over multiple servers. DIXIE focuses on increasing interquery parallel speedup by involving as few servers as possible in each query. One way it does this is by supporting tables with multiple copies partitioned on different columns, in order to expand the set of queries that can be satisified from a single server. DIXIE automatically transforms SQL queries to execute over a partitioned database, using a cost model and plan generator that exploit multiple table copies. We evaluate DIXIE on a database and query stream taken from Wikipedia, partitioned across ten MySQL servers. By adding one copy of a 13 MB table and using DIXIE's query optimizer, we achieve a throughput improvement of 3.2X over a single optimized partitioning of each table and 8.5X over the same data on a single server. On specific queries DIXIE with table copies increases throughput linearly with the number of servers, while the best single-table-copy partitioning achieves little scaling. For a large class of joins, which traditional wisdom suggests requires tables partitioned on the join keys, DIXIE can find higher-performance plans using other partitionings.