Automating physical database design in a parallel database

Authors:
Jun Rao;Chun Zhang;Nimrod Megiddo;Guy Lohman
Affiliations:
IBM Almaden Research Center;University of Wisconsin, Madison;IBM Almaden Research Center;IBM Almaden Research Center
Venue:
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Year:
2002

Citing 18
Cited 43

Database partitioning in a cluster of processors

ACM Transactions on Database Systems (TODS)
Physical database design for relational databases

ACM Transactions on Database Systems (TODS)
Data placement in Bubba

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Measuring the complexity of join enumeration in query optimization

Proceedings of the sixteenth international conference on Very large databases
Left-deep vs. bushy trees: an analysis of strategy spaces and its implications for query optimization

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of high performance database systems

Communications of the ACM
A decomposition-based simulated annealing technique for data clustering

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The COMFORT automatic tuning project

Information Systems
DB2 parallel edition

IBM Systems Journal
Microsoft index turning wizard for SQL Server 7.0

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Analysis and performance of inverted data base structures

Communications of the ACM
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Automated Selection of Materialized Views and Indexes in SQL Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Multi-Dimensional Database Allocation for Parallel Data Warehouses

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Analysis of Dynamic Load Balancing Strategies for Parallel Shared Nothing Database Systems

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
DB2 Advisor: An Optimizer Smart Enough to Recommend its own Indexes

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Physical database design in multiprocessor database systems

Physical database design in multiprocessor database systems

Toward autonomic computing with DB2 universal database

ACM SIGMOD Record
Estimating compilation time of a query optimizer

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Recent progress on selected topics in database research: a report by nine young Chinese researchers working in the United States

Journal of Computer Science and Technology
The Design, Implementation and Evaluation of an ODMG Compliant, Parallel Object Database Server

Distributed and Parallel Databases
Integrating vertical and horizontal partitioning into automated physical database design

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Experimental evidence on partitioning in parallel data warehouses

Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
Efficiently Processing Query-Intensive Databases over a Non-Dedicated Local Network

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Automatic physical design tuning: workload as a sequence

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Architecture and interface of scalable distributed database system SD-SQL server

DBA'06 Proceedings of the 24th IASTED international conference on Database and applications
Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more

Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more
Storage workload estimation for database management systems

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Self-tuning database technology and information services: from wishful thinking to viable engineering

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
SMART: making DB2 (more) autonomic

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Online balancing of range-partitioned data with applications to peer-to-peer systems

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
DB2 design advisor: integrated automatic physical database design

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Automated design of multidimensional clustering tables for relational databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient use of the query optimizer for automated physical design

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Improving parallelism of federated query processing

Data & Knowledge Engineering
Load distribution of analytical query workloads for database cluster architectures

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Query-aware partitioning for monitoring massive network data streams

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Model and procedure for performance and availability-wise parallel warehouses

Distributed and Parallel Databases
Parallel OLAP with the Sidera server

Future Generation Computer Systems
Declarative management in Microsoft SQL server

Proceedings of the VLDB Endowment
Sidera: a cluster-based server for online analytical processing

OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
Load-balancing for WAN warehouses

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Time-HOBI: indexing dimension hierarchies by means of hierarchically organized bitmaps

DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
DYFRAM: dynamic fragmentation and replica management in distributed database systems

Distributed and Parallel Databases
Schism: a workload-driven approach to database replication and partitioning

Proceedings of the VLDB Endowment
Query optimization techniques for partitioned tables

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Automated partitioning design in parallel database systems

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Automatic physical database tuning middleware for web-based applications

ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
Verification of partitioning and allocation techniques on teradata DBMS

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Hierarchical aggregation in networked data management

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Large relations in node-partitioned data warehouses

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Time-HOBI: Index for optimizing star queries

Information Systems
SPIDER: an autonomic computing approach to database security management

SDM'06 Proceedings of the Third VLDB international conference on Secure Data Management
An overview of a scalable distributed database system SD-SQL server

BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Automatic partitioning of database applications

Proceedings of the VLDB Endowment
Scalable and dynamically balanced shared-everything OLTP with physiological partitioning

The VLDB Journal — The International Journal on Very Large Data Bases
Self-managing online partitioner for databases (SMOPD): a vertical database partitioning system with a fully automatic online approach

Proceedings of the 17th International Database Engineering & Applications Symposium
bCATE: a balanced contention-aware transaction execution model for highly concurrent OLTP systems

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Design and evaluation of storage organizations for read-optimized main memory databases

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Physical database design is important for query performance in a shared-nothing parallel database system, in which data is horizontally partitioned among multiple independent nodes. We seek to automate the process of data partitioning. Given a workload of SQL statements, we seek to determine automatically how to partition the base data across multiple nodes to achieve overall optimal (or close to optimal) performance for that workload. Previous attempts use heuristic rules to make those decisions. These approaches fail to consider all of the interdependent aspects of query performance typically modeled by today's sophisticated query optimizers.We present a comprehensive solution to the problem that has been tightly integrated with the optimizer of a commercial shared-nothing parallel database system. Our approach uses the query optimizer itself both to recommend candidate partitions for each table that will benefit each query in the workload, and to evaluate various combinations of these candidates. We compare a rank-based enumeration method with a random-based one. Our experimental results show that the former is more effective.