Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
When Hamming meets Euclid: the approximability of geometric TSP and MST (extended abstract)
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Non first normal form relations to represent hierarchically organized data
PODS '84 Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems
Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Load Balancing Using Bisectors - A Tight Average-Case Analysis
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Integrating compression and execution in column-oriented database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Compressing large boolean matrices using reordering techniques
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bigtable: A Distributed Storage System for Structured Data
ACM Transactions on Computer Systems (TOCS)
Column-stores vs. row-stores: how different are they really?
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Brighthouse: an analytic data warehouse for ad-hoc queries
Proceedings of the VLDB Endowment
Query execution in column-oriented database systems
Query execution in column-oriented database systems
Challenges in building large-scale information retrieval systems: invited talk
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Self-organizing tuple reconstruction in column-stores
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Column-oriented database systems
Proceedings of the VLDB Endowment
Dremel: interactive analysis of web-scale datasets
Communications of the ACM
Reordering columns for smaller indexes
Information Sciences: an International Journal
Algorithms
Proceedings of the 16th International Conference on Extending Database Technology
Stat!: an interactive analytics environment for big data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Shark: SQL and rich analytics at scale
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Scaling big data mining infrastructure: the twitter experience
ACM SIGKDD Explorations Newsletter
Lazy data structure maintenance for main-memory analytics over sliding windows
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Scuba: diving into data at facebook
Proceedings of the VLDB Endowment
Scalable progressive analytics on big data in the cloud
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Column-oriented database systems have been a real game changer for the industry in recent years. Highly tuned and performant systems have evolved that provide users with the possibility of answering ad hoc queries over large datasets in an interactive manner. In this paper we present the column-oriented datastore developed as one of the central components of PowerDrill. It combines the advantages of columnar data layout with other known techniques (such as using composite range partitions) and extensive algorithmic engineering on key data structures. The main goal of the latter being to reduce the main memory footprint and to increase the efficiency in processing typical user queries. In this combination we achieve large speed-ups. These enable a highly interactive Web UI where it is common that a single mouse click leads to processing a trillion values in the underlying dataset.