Aggregation strategies for columnar in-memory databases in a mixed workload

Authors:
Stephan Müller;Hasso Plattner
Affiliations:
Hasso-Plattner-Institut, Potsdam, Germany;Hasso-Plattner-Institut, Potsdam, Germany
Venue:
Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Year:
2011

Citing 29
Cited 1

Processing aggregate relational queries with hard time constraints

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient view maintenance at data warehouses

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Database abstractions: aggregation and generalization

ACM Transactions on Database Systems (TODS)
A relational model of data for large shared data banks

Communications of the ACM
Learning table access cardinalities with LEO

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Main Memory Database Systems: An Overview

IEEE Transactions on Knowledge and Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Selection of Views to Materialize Under a Maintenance Cost Constraint

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Aggregation Algorithms for Very Large Compressed Data Warehouses

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Aggregate-Query Processing in Data Warehousing Environments

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Answering Queries with Aggregation Using Views

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
Statistical learning techniques for costing XML queries

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Generic database cost models for hierarchical memory systems

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Adaptive aggregation on chip multiprocessors

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Column-stores vs. row-stores: how different are they really?

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Breaking the memory wall in MonetDB

Communications of the ACM - Surviving the data deluge
High Performance Parallel Database Processing and Grid Databases

High Performance Parallel Database Processing and Grid Databases
Managing operational business intelligence workloads

ACM SIGOPS Operating Systems Review
Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A common database approach for OLTP and OLAP using an in-memory column database

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HYRISE: a main memory hybrid storage engine

Proceedings of the VLDB Endowment
In-Memory Data Management: An Inflection Point for Enterprise Applications

In-Memory Data Management: An Inflection Point for Enterprise Applications
Managing dynamic mixed workloads for operational business intelligence

DNIS'10 Proceedings of the 6th international conference on Databases in Networked Information Systems

PIKM 2011: the 4th ACM workshop for Ph.D. students in information and knowledge management

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recent trend towards analytics on operational data has led to an approach of reunifying online transactional processing and online analytical processing in one single database. The advent of columnar in-memory databases makes this viable and feasible as expensive join and aggregation operations can be performed with superior performance compared to traditional row-oriented databases. This has led to the radical proposal of abandoning materialized aggregate tables and calculate all aggregations on the fly. This PhD research project investigates factors that have an influence on the aggregation performance in columnar in-memory databases. Based on the identified factors, we aim to evaluate different cost model approaches, that are subject to validation with real-life data of large industry customers and their mixed workloads. The goal of this project is the design and implementation of an aggregation engine that decides, based on the data and application characteristics, the historic and current workload and other cost-relevant factors, whether it is beneficial with regards to query performance, but also considering aggregation view maintenance costs, to materialize an aggregate or not.