Incremental recomputations in MapReduce
Proceedings of the third international workshop on Cloud data management
Densest subgraph in streaming and MapReduce
Proceedings of the VLDB Endowment
Large-scale machine learning at twitter
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Bizard: an online multi-dimensional data analysis visualization tool
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
The unified logging infrastructure for data analytics at Twitter
Proceedings of the VLDB Endowment
Avatara: OLAP for web-scale analytics products
Proceedings of the VLDB Endowment
Efficient distributed parallel top-down computation of ROLAP data cube using mapreduce
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
The big data ecosystem at LinkedIn
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Design and evaluation of storage organizations for read-optimized main memory databases
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Cube computation over massive datasets is critical for many important analyses done in the real world. Unlike commonly studied algebraic measures such as SUM that are amenable to parallel computation, efficient cube computation of holistic measures such as TOP-K is non-trivial and often impossible with current methods. In this paper we detail real-world challenges in cube materialization tasks on Web-scale datasets. Specifically, we identify an important subset of holistic measures and introduce MR-Cube, a MapReduce based framework for efficient cube computation on these measures. We provide extensive experimental analyses over both real and synthetic data. We demonstrate that, unlike existing techniques which cannot scale to the 100 million tuple mark for our datasets, MR-Cube successfully and efficiently computes cubes with holistic measures over billion-tuple datasets.