Massively parallel data analysis with PACTs on Nephele

Authors:
Alexander Alexandrov;Max Heimel;Volker Markl;Dominic Battré;Fabian Hueske;Erik Nijkamp;Stephan Ewen;Odej Kao;Daniel Warneke
Affiliations:
Technische Universität Berlin, Germany;Technische Universität Berlin, Germany;Technische Universität Berlin, Germany;Technische Universität Berlin, Germany;Technische Universität Berlin, Germany;Technische Universität Berlin, Germany;Technische Universität Berlin, Germany;Technische Universität Berlin, Germany;Technische Universität Berlin, Germany
Venue:
Proceedings of the VLDB Endowment
Year:
2010

Citing 5
Cited 5

Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Nephele: efficient parallel data processing in the cloud

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing

Proceedings of the 1st ACM symposium on Cloud computing

Integrating open government data with stratosphere for more transparency

Web Semantics: Science, Services and Agents on the World Wide Web
The DEBS 2012 grand challenge

Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics

Proceedings of the 16th International Conference on Extending Database Technology
Issues in big data testing and benchmarking

Proceedings of the Sixth International Workshop on Testing Database Systems
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale data analysis applications require processing and analyzing of Terabytes or even Petabytes of data, particularly in the areas of web analysis or scientific data management. This trend has been discussed as "web-scale data management" in a panel at VLDB 2009. Formerly, parallel data processing was the domain of parallel database systems. Today, novel requirements like scaling out to thousands of machines, improved fault-tolerance, and schema free processing have made a case for new approaches.