Processing a trillion cells per mouse click

Authors:
Alexander Hall;Olaf Bachmann;Robert Büssow;Silviu Gănceanu;Marc Nunkesser
Affiliations:
Google, Inc.;Google, Inc.;Google, Inc.;Google, Inc.;Google, Inc.
Venue:
Proceedings of the VLDB Endowment
Year:
2012

Citing 24
Cited 7

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
When Hamming meets Euclid: the approximability of geometric TSP and MST (extended abstract)

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Non first normal form relations to represent hierarchically organized data

PODS '84 Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems
Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Counting Distinct Elements in a Data Stream

RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Load Balancing Using Bisectors - A Tight Average-Case Analysis

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Outperforming LRU with an Adaptive Replacement Cache Algorithm

Computer
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Compressing large boolean matrices using reordering techniques

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Column-stores vs. row-stores: how different are they really?

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Brighthouse: an analytic data warehouse for ad-hoc queries

Proceedings of the VLDB Endowment
Query execution in column-oriented database systems

Query execution in column-oriented database systems
Challenges in building large-scale information retrieval systems: invited talk

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Self-organizing tuple reconstruction in column-stores

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Column-oriented database systems

Proceedings of the VLDB Endowment
Dremel: interactive analysis of web-scale datasets

Communications of the ACM
Reordering columns for smaller indexes

Information Sciences: an International Journal
Algorithms

Algorithms

HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm

Proceedings of the 16th International Conference on Extending Database Technology
Stat!: an interactive analytics environment for big data

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Shark: SQL and rich analytics at scale

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Scaling big data mining infrastructure: the twitter experience

ACM SIGKDD Explorations Newsletter
Lazy data structure maintenance for main-memory analytics over sliding windows

Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Scuba: diving into data at facebook

Proceedings of the VLDB Endowment
Scalable progressive analytics on big data in the cloud

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Column-oriented database systems have been a real game changer for the industry in recent years. Highly tuned and performant systems have evolved that provide users with the possibility of answering ad hoc queries over large datasets in an interactive manner. In this paper we present the column-oriented datastore developed as one of the central components of PowerDrill. It combines the advantages of columnar data layout with other known techniques (such as using composite range partitions) and extensive algorithmic engineering on key data structures. The main goal of the latter being to reduce the main memory footprint and to increase the efficiency in processing typical user queries. In this combination we achieve large speed-ups. These enable a highly interactive Web UI where it is common that a single mouse click leads to processing a trillion values in the underlying dataset.