A bridging model for parallel computation
Communications of the ACM
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Proceedings of the VLDB Endowment
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing
Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Opening the black boxes in data flow optimization
Proceedings of the VLDB Endowment
Spinning fast iterative data flows
Proceedings of the VLDB Endowment
Large-scale social-media analytics on stratosphere
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
Iterative algorithms occur in many domains of data analysis, such as machine learning or graph analysis. With increasing interest to run those algorithms on very large data sets, we see a need for new techniques to execute iterations in a massively parallel fashion. In prior work, we have shown how to extend and use a parallel data flow system to efficiently run iterative algorithms in a shared-nothing environment. Our approach supports the incremental processing nature of many of those algorithms. In this demonstration proposal we illustrate the process of implementing, compiling, optimizing, and executing iterative algorithms on Stratosphere using examples from graph analysis and machine learning. For the first step, we show the algorithm's code and a visualization of the produced data flow programs. The second step shows the optimizer's execution plan choices, while the last phase monitors the execution of the program, visualizing the state of the operators and additional metrics, such as per-iteration runtime and number of updates. To show that the data flow abstraction supports easy creation of custom programming APIs, we also present programs written against a lightweight Pregel API that is layered on top of our system with a small programming effort.