Parallel database systems: the future of high performance database systems
Communications of the ACM
Two techniques for on-line index modification in shared nothing parallel databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Efficient and accurate cost models for parallel query optimization (extended abstract)
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
AutoAdmin “what-if” index analysis utility
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Characterization and Theoretical Comparison of Branch-and-Bound Algorithms for Permutation Problems
Journal of the ACM (JACM)
Automating physical database design in a parallel database
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Automated Selection of Materialized Views and Indexes in SQL Databases
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Multi-Dimensional Database Allocation for Parallel Data Warehouses
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The Volcano Optimizer Generator: Extensibility and Efficient Search
Proceedings of the Ninth International Conference on Data Engineering
Physical database design decision algorithms and concurrent reorganization for parallel database systems
Integrating vertical and horizontal partitioning into automated physical database design
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Distributed query evaluation with performance guarantees
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
DB2 design advisor: integrated automatic physical database design
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient use of the query optimizer for automated physical design
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Configuration-parametric query optimization for physical design tuning
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM - Amir Pnueli: Ahead of His Time
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Schism: a workload-driven approach to database replication and partitioning
Proceedings of the VLDB Endowment
Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Query optimization in microsoft SQL server PDW
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
A new tool for multi-level partitioning in teradata
Proceedings of the 21st ACM international conference on Information and knowledge management
SWORD: scalable workload-aware data placement for transactional workloads
Proceedings of the 16th International Conference on Extending Database Technology
Parallel analytics as a service
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
In recent years, Massively Parallel Processors (MPPs) have gained ground enabling vast amounts of data processing. In such environments, data is partitioned across multiple compute nodes, which results in dramatic performance improvements during parallel query execution. To evaluate certain relational operators in a query correctly, data sometimes needs to be re-partitioned (i.e., moved) across compute nodes. Since data movement operations are much more expensive than relational operations, it is crucial to design a suitable data partitioning strategy that minimizes the cost of such expensive data transfers. A good partitioning strategy strongly depends on how the parallel system would be used. In this paper we present a partitioning advisor that recommends the best partitioning design for an expected workload. Our tool recommends which tables should be replicated (i.e., copied into every compute node) and which ones should be distributed according to specific column(s) so that the cost of evaluating similar workloads is minimized. In contrast to previous work, our techniques are deeply integrated with the underlying parallel query optimizer, which results in more accurate recommendations in a shorter amount of time. Our experimental evaluation using a real MPP system, Microsoft SQL Server 2008 Parallel Data Warehouse, with both real and synthetic workloads shows the effectiveness of the proposed techniques and the importance of deep integration of the partitioning advisor with the underlying query optimizer.