A recursive algebra and query optimization for nested relations
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
XMill: an efficient compressor for XML data
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
ORDPATHs: insert-friendly XML node labels
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Challenges in building large-scale information retrieval systems: invited talk
Proceedings of the Second ACM International Conference on Web Search and Data Mining
MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM - Amir Pnueli: Ahead of His Time
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Column-oriented database systems
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
Processing a trillion cells per mouse click
Proceedings of the VLDB Endowment
MTCProv: a practical provenance query framework for many-task scientific computing
Distributed and Parallel Databases
BlinkDB: queries with bounded errors and bounded response times on very large data
Proceedings of the 8th ACM European Conference on Computer Systems
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Hi-index | 48.22 |
Dremel is a scalable, interactive ad hoc query system for analysis of read-only nested data. By combining multilevel execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing. We present a novel columnar storage representation for nested records and discuss experiments on few-thousand node instances of the system.