An in-depth analysis of data aggregation cost factors in a columnar in-memory database

Authors:
Stephan Müller;Hasso Plattner
Affiliations:
Hasso-Plattner-Institut, University of Potsdam, Potsdam, Germany;Hasso-Plattner-Institut, University of Potsdam, Potsdam, Germany
Venue:
Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Year:
2012

Citing 12
Cited 2

Data compression: methods and theory

Data compression: methods and theory
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Database abstractions: aggregation and generalization

ACM Transactions on Database Systems (TODS)
A relational model of data for large shared data banks

Communications of the ACM
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Generic database cost models for hierarchical memory systems

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Adaptive aggregation on chip multiprocessors

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
High Performance Parallel Database Processing and Grid Databases

High Performance Parallel Database Processing and Grid Databases
A common database approach for OLTP and OLAP using an in-memory column database

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HYRISE: a main memory hybrid storage engine

Proceedings of the VLDB Endowment

DOLAP 2012 workshop summary

Proceedings of the 21st ACM international conference on Information and knowledge management
Lazy data structure maintenance for main-memory analytics over sliding windows

Proceedings of the sixteenth international workshop on Data warehousing and OLAP

Quantified Score

Hi-index	0.00

Visualization

Abstract

Precise prediction of query execution performance is the basis for various database optimization strategies. With columnar in-memory databases, cost modeling changes in two dimensions: First, models for disk-based databases are not well-suited as the new bottleneck is main memory access. Second, the possibility to execute mixed workloads creates new challenges. For transactional and analytical queries with aggregation operations, memory access patterns and thus execution times vary significantly. This paper discusses the influences of data characteristics on aggregation operations and elevates not considered factors by existing cost model approaches. Further, we present benchmarks implemented and executed on a columnar in-memory research database to underline our assumptions.