The input/output complexity of sorting and related problems
Communications of the ACM
The design and implementation of a log-structured file system
ACM Transactions on Computer Systems (TOCS)
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
The log-structured merge-tree (LSM-tree)
Acta Informatica
A very fast algorithm for RAM compression
ACM SIGOPS Operating Systems Review
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An overview of query optimization in relational systems
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
An alternative storage organization for ROLAP aggregate views based on cubetrees
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Extendible hashing—a fast access method for dynamic files
ACM Transactions on Database Systems (TODS)
Query optimization in compressed database systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
Aggregation Algorithms for Very Large Compressed Data Warehouses
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Performance of Hardware Compressed Main Memory
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
N-gram-based Machine Translation
Computational Linguistics
Cache-oblivious streaming B-trees
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Bigtable: A Distributed Storage System for Structured Data
ACM Transactions on Computer Systems (TOCS)
SPADE: the system s declarative stream processing engine
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Disaggregated memory for expansion and sharing in blade servers
Proceedings of the 36th annual international symposium on Computer architecture
Distributed aggregation for data-parallel computing: interfaces and implementations
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Real-word spelling correction using Google Web IT 3-grams
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Cassandra: a decentralized structured storage system
ACM SIGOPS Operating Systems Review
A Map-Reduce System with an Alternate API for Multi-core Environments
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
A platform for scalable one-pass analytics using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Phoenix++: modular MapReduce for shared-memory systems
Proceedings of the second international workshop on MapReduce and its applications
bLSM: a general purpose log structured merge tree
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding
IEEE Transactions on Information Theory
Muppet: MapReduce-style processing of fast data
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
The rapid growth of fast analytics systems, that require data processing in memory, makes memory capacity an increasingly-precious resource. This paper introduces a new compressed data structure called a Compressed Buffer Tree (CBT). Using a combination of techniques including buffering, compression, and serialization, CBTs improve the memory efficiency and performance of the GroupBy-Aggregate abstraction that forms the basis of not only batch-processing models like MapReduce, but recent fast analytics systems too. For streaming workloads, aggregation using the CBT uses 21--42% less memory than using Google SparseHash with up to 16% better throughput. The CBT is also compared to batch-mode aggregators in MapReduce runtimes such as Phoenix++ and Metis and consumes 4x and 5x less memory with 1.5--2x and 3--4x more performance respectively.