AutoPart: Automating Schema Design for Large Scientific Databases Using Data Partitioning

Authors:
Stratos Papadomanolakis;Anastassia Ailamaki
Affiliations:
Carnegie Mellon University;Carnegie Mellon University
Venue:
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Year:
2004

Citing 0
Cited 28

Goals and benchmarks for autonomic configuration recommenders

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Optimization of query processing through constrained vertical partitioning of relational tables

DBA'06 Proceedings of the 24th IASTED international conference on Database and applications
Self-tuning database systems: a decade of progress

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Self-organizing strategies for a column-store database

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Data Partitioning in Data Warehouses: Hardness Study, Heuristics and ORACLE Validation

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Algorithms for data warehouse design to enhance decision-making

WSEAS Transactions on Computer Research
Adaptive Physical Design for Curated Archives

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Fine-grained updates in database management systems for flash memory

Information Sciences: an International Journal
PARINDA: an interactive physical designer for PostgreSQL

Proceedings of the 13th International Conference on Extending Database Technology
A workload-driven unit of cache replacement for mid-tier database caching

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Low overhead concurrency control for partitioned main memory databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An automated, yet interactive and portable DB designer

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Vertical partitioning for flash and HDD database systems

Journal of Systems and Software
CRIUS: user-friendly database design

Proceedings of the VLDB Endowment
On simplifying integrated physical database design

ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
Automatic physical database tuning middleware for web-based applications

ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
SAGA: a combination of genetic and simulated annealing algorithms for physical data warehouse design

BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Organic databases

DNIS'11 Proceedings of the 7th international conference on Databases in Networked Information Systems
Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
NoDB: efficient query execution on raw data files

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
DYMOND: an active system for dynamic vertical partitioning of multimedia databases

Proceedings of the 16th International Database Engineering & Applications Sysmposium
Normalization in a mixed OLTP and OLAP workload scenario

TPCTC'11 Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and Characterization
Pruning search space of physical database design

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Exploiting data access for dynamic fragmentation in data warehouse

International Journal of Intelligent Information and Database Systems
An automatic physical design tool for clustered column-stores

Proceedings of the 16th International Conference on Extending Database Technology
DeepSea: self-adaptive data partitioning and replication in scalable distributed data systems

Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Evaluation of RDF queries via equivalence

Frontiers of Computer Science: Selected Publications from Chinese Universities
A comparison of knives for bread slicing

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database applications that use multi-terabyte datasets arebecoming increasingly important for scientific fields such asastronomy and biology. Scientific databases are particularlysuited for the application of automated physical design techniques,because of their data volume and the complexity of thescientific workloads. Current automated physical design toolsfocus on the selection of indexes and materialized views. Inlarge-scale scientific databases, however, the data volume andthe continuous insertion of new data allows for only limitedindexes and materialized views. By contrast, data partitioningdoes not replicate data, thereby reducing space requirements andminimizing update overhead. In this paper we present AutoPart,an algorithm that automatically partitions database tables tooptimize sequential access assuming prior knowledge of a representativeworkload. The resulting schema is indexed using a fractionof the space required for indexing the original schema. Toevaluate AutoPart we built an automated schema design tool thatinterfaces to commercial database systems. We experiment withAutoPart in the context of the Sloan Digital Sky Survey database,a real-world astronomical database, running on SQL Server2000. Our experiments demonstrate the benefits of partitioningfor large-scale systems: Partitioning alone improves query executionperformance by a factor of two on average. Combinedwith indexes, the new schema also outperforms the indexed originalschema by 20% (for queries) and a factor of five (forupdates), while using only half the original index space.