A recursive algebra and query optimization for nested relations
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
XMill: an efficient compressor for XML data
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
ORDPATHs: insert-friendly XML node labels
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Challenges in building large-scale information retrieval systems: invited talk
Proceedings of the Second ACM International Conference on Web Search and Data Mining
MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM - Amir Pnueli: Ahead of His Time
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Column-oriented database systems
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Dremel: interactive analysis of web-scale datasets
Communications of the ACM
ASTERIX: towards a scalable, semistructured data platform for evolving-world models
Distributed and Parallel Databases
Column-oriented storage techniques for MapReduce
Proceedings of the VLDB Endowment
Parallel evaluation of conjunctive queries
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Llama: leveraging columnar storage for scalable join processing in the MapReduce framework
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Non-deterministic parallelism considered useful
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Full-text indexing for optimizing selection operations in large-scale data analytics
Proceedings of the second international workshop on MapReduce and its applications
In-situ MapReduce for log processing
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Brown Dwarf: A fully-distributed, fault-tolerant data warehousing system
Journal of Parallel and Distributed Computing
Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
ChuQL: processing XML with XQuery using Hadoop
Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
GLADE: a scalable framework for efficient analytics
ACM SIGOPS Operating Systems Review
Energy efficiency for large-scale MapReduce workloads with significant interactive analysis
Proceedings of the 7th ACM european conference on Computer Systems
LazyBase: trading freshness for performance in a scalable database
Proceedings of the 7th ACM european conference on Computer Systems
Privacy-sensitive VM retrospection
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
In-situ MapReduce for log processing
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Trust me, i'm partially right: incremental visualization lets analysts explore large datasets faster
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Advanced partitioning techniques for massively distributed computation
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
FunSQL: it is time to make SQL functional
Proceedings of the 2012 Joint EDBT/ICDT Workshops
Peregrine: Low-latency queries on Hive warehouse data
XRDS: Crossroads, The ACM Magazine for Students - Big Data
Why let resources idle? aggressive cloning of jobs with dolly
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Toward efficient querying of compressed network payloads
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads
Proceedings of the VLDB Endowment
Self-adaptive approximate queries for large-scale information aggregation
International Journal of Web and Grid Services
SCOPE: parallel databases meet MapReduce
The VLDB Journal — The International Journal on Very Large Data Bases
Spanner: Google's globally-distributed database
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
Efficient processing of containment queries on nested sets
Proceedings of the 16th International Conference on Extending Database Technology
Elastic online analytical processing on RAMCloud
Proceedings of the 16th International Conference on Extending Database Technology
Proceedings of the 16th International Conference on Extending Database Technology
Stat!: an interactive analytics environment for big data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Communication steps for parallel query processing
Proceedings of the 32nd symposium on Principles of database systems
Shark: SQL and rich analytics at scale
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
CPI2: CPU performance isolation for shared compute clusters
Proceedings of the 8th ACM European Conference on Computer Systems
Scaling big data mining infrastructure: the twitter experience
ACM SIGKDD Explorations Newsletter
Effective straggler mitigation: attack of the clones
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Exploiting in-network processing for big data management
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Supporting application-specific in-network processing in data centres
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Sparrow: distributed, low latency scheduling
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
jVerbs: ultra-low latency for data center applications
Proceedings of the 4th annual Symposium on Cloud Computing
On bridging relational and document-centric data stores
BNCOD'13 Proceedings of the 29th British National conference on Big Data
CRUCIBLE: towards unified secure on- and off-line analytics at scale
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Specialized storage for big numeric time series
HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
Continuous cloud-scale query optimization and processing
Proceedings of the VLDB Endowment
Scuba: diving into data at facebook
Proceedings of the VLDB Endowment
Overview of turn data management platform for digital advertising
Proceedings of the VLDB Endowment
Scalable progressive analytics on big data in the cloud
Proceedings of the VLDB Endowment
Hi-index | 0.02 |
Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing. We present a novel columnar storage representation for nested records and discuss experiments on few-thousand node instances of the system.