Data morphing: an adaptive, cache-conscious storage technique

Authors:
Richard A. Hankins;Jignesh M. Patel
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Year:
2003

Citing 16
Cited 20

Vertical partitioning algorithms for database design

ACM Transactions on Database Systems (TODS)
Asymptotic enumeration methods

Handbook of combinatorics (vol. 2)
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
The Asilomar report on database research

ACM SIGMOD Record
A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Approximating block accesses in database organizations

Communications of the ACM
Analysis and performance of inverted data base structures

Communications of the ACM
Database Management Systems

Database Management Systems
Database Architecture Optimized for the New Bottleneck: Memory Access

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Weaving Relations for Cache Performance

Proceedings of the 27th International Conference on Very Large Data Bases
Design and Implementation of the Concurrency Control Manager in the Main-Memory DBMS Tachyon

COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
Cache Conscious Algorithms for Relational Query Processing

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficient Storage of XML Data

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A Portable Programming Interface for Performance Evaluation on Modern Processors

International Journal of High Performance Computing Applications

Cache-Conscious Automata for XML Filtering

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Cache-Conscious Automata for XML Filtering

IEEE Transactions on Knowledge and Data Engineering
Efficient execution of multiple queries on deep memory hierarchy

Journal of Computer Science and Technology
Clotho: decoupling memory page layout from storage organization

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A general framework for improving query processing performance on multi-level memory hierarchies

DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Cache-oblivious databases: Limitations and opportunities

ACM Transactions on Database Systems (TODS)
Read-Optimized, Cache-Conscious, Page Layouts for Temporal Relational Data

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Read-optimized databases, in depth

Proceedings of the VLDB Endowment
Avoiding version redundancy for high performance reads in temporal databases

Proceedings of the 4th international workshop on Data management on new hardware
DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing

Proceedings of the 4th international workshop on Data management on new hardware
Database architecture evolution: mammals flourished long before dinosaurs became extinct

Proceedings of the VLDB Endowment
MOSS-DB: a hardware-aware OLAP database

WAIM'10 Proceedings of the 11th international conference on Web-age information management
HYRISE: a main memory hybrid storage engine

Proceedings of the VLDB Endowment
Improving performance by creating a native join-index for OLAP

Frontiers of Computer Science in China
Trojan data layouts: right shoes for a running elephant

Proceedings of the 2nd ACM Symposium on Cloud Computing
MiniTasking: improving cache performance for multiple query workloads

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
CDDTA-JOIN: one-pass OLAP algorithm for column-oriented databases

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
LogBase: a scalable log-structured database system in the cloud

Proceedings of the VLDB Endowment
Design and evaluation of storage organizations for read-optimized main memory databases

Proceedings of the VLDB Endowment
A comparison of knives for bread slicing

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.04

Visualization

Abstract

The number of processor cache misses has a critical impact on the performance of DBMSs running on servers with large main-memory configurations. In turn, the cache utilization of database systems is highly dependent on the physical organization of the records in main-memory. A recently proposed storage model, called PAX, was shown to greatly improve the performance of sequential file-scan operations when compared to the commonly implemented N-ary storage model. However, the PAX storage model can also demonstrate poor cache utilization for other common operations, such as index scans. Under a workload of heterogenous database operations, neither the PAX storage model nor the N-ary storage model is optimal. In this paper, we propose a flexible data storage technique called Data Morphing. Using Data Morphing, a cache-efficient attribute layout, called a partition, is first determined through an analysis of the query workload. This partition is then used as a template for storing data in a cache-efficient way. We present two algorithms for computing partitions, and also present a versatile storage model that accommodates the dynamic reorganization of the attributes in a file. Finally, we experimentally demonstrate that the Data Morphing technique provides a significant performance improvement over both the traditional N-ary storage model and the PAX model.