Instant loading for main memory databases

Authors:
Tobias Mühlbauer;Wolf Rödiger;Robert Seilbeck;Angelika Reiser;Alfons Kemper;Thomas Neumann
Affiliations:
Technische Universität München, Munich, Germany;Technische Universität München, Munich, Germany;Technische Universität München, Munich, Germany;Technische Universität München, Munich, Germany;Technische Universität München, Munich, Germany;Technische Universität München, Munich, Germany
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 22
Cited 0

Implementing database operations using SIMD instructions

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Scientific data management in the coming decade

ACM SIGMOD Record
B-tree indexes for high update rates

ACM SIGMOD Record
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
The sqlLoader Data-Loading Pipeline

Computing in Science and Engineering
H-store: a high-performance, distributed main memory transaction processing system

Proceedings of the VLDB Endowment
Intel threading building blocks

Intel threading building blocks
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units

Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Fast loads and queries

Transactions on large-scale data- and knowledge-centered systems II
Efficiently compiling efficient query plans for modern hardware

Proceedings of the VLDB Endowment
Merging what's cracked, cracking what's merged: adaptive indexing in main-memory column-stores

Proceedings of the VLDB Endowment
HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
NoDB: efficient query execution on raw data files

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Only aggressive elephants are fast elephants

Proceedings of the VLDB Endowment
Data vaults: a symbiosis between database technology and scientific file repositories

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Split query processing in polybase

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
The adaptive radix tree: ARTful indexing for main-memory databases

ICDE '13 Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)

Quantified Score

Hi-index	0.00

Visualization

Abstract

eScience and big data analytics applications are facing the challenge of efficiently evaluating complex queries over vast amounts of structured text data archived in network storage solutions. To analyze such data in traditional disk-based database systems, it needs to be bulk loaded, an operation whose performance largely depends on the wire speed of the data source and the speed of the data sink, i.e., the disk. As the speed of network adapters and disks has stagnated in the past, loading has become a major bottleneck. The delays it is causing are now ubiquitous as text formats are a preferred storage format for reasons of portability. But the game has changed: Ever increasing main memory capacities have fostered the development of in-memory database systems and very fast network infrastructures are on the verge of becoming economical. While hardware limitations for fast loading have disappeared, current approaches for main memory databases fail to saturate the now available wire speeds of tens of Gbit/s. With Instant Loading, we contribute a novel CSV loading approach that allows scalable bulk loading at wire speed. This is achieved by optimizing all phases of loading for modern super-scalar multi-core CPUs. Large main memory capacities and Instant Loading thereby facilitate a very efficient data staging processing model consisting of instantaneous load-work-unload cycles across data archives on a single node. Once data is loaded, updates and queries are efficiently processed with the flexibility, security, and high performance of relational main memory databases.