Maximum likelihood network topology identification from edge-based unicast measurements
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Condor-G: A Computation Management Agent for Multi-Institutional Grids
Cluster Computing
LEO - DB2's LEarning Optimizer
Proceedings of the 27th International Conference on Very Large Data Bases
VDE: Virtual Distributed Ethernet
TRIDENTCOM '05 Proceedings of the First International Conference on Testbeds and Research Infrastructures for the DEvelopment of NeTworks and COMmunities
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Falkon: a Fast and Light-weight tasK executiON framework
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
virtio: towards a de-facto standard for virtual I/O devices
ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Hadoop: The Definitive Guide
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing
Proceedings of the 1st ACM symposium on Cloud computing
Massively parallel data analysis with PACTs on Nephele
Proceedings of the VLDB Endowment
Towards jungle computing with Ibis/Constellation
Proceedings of the 2011 workshop on Dynamic distributed data-intensive applications, programming abstractions, and systems
An approach for processing large and non-uniform media objects on mapreduce-based clusters
ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Integrating open government data with stratosphere for more transparency
Web Semantics: Science, Services and Agents on the World Wide Web
A highly efficient cloud-based architecture for large-scale STB event processing: industry article
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
An adaptive parallel execution strategy for cloud-based scientific workflows
Concurrency and Computation: Practice & Experience
Opening the black boxes in data flow optimization
Proceedings of the VLDB Endowment
A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds
Journal of Grid Computing
Type 2 slowly changing dimensions: a case study using the cooperating system
Proceedings of the fifteenth international workshop on Data warehousing and OLAP
A case for dynamic memory partitioning in data centers
Proceedings of the Second Workshop on Data Analytics in the Cloud
Adaptive Online Compression in Clouds--Making Informed Decisions in Virtual Machine Environments
Journal of Grid Computing
Proceedings of the 2nd ACM workshop on High performance mobile opportunistic systems
PonIC: using stratosphere to speed up pig analytics
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.01 |
In recent years Cloud Computing has emerged as a promising new approach for ad-hoc parallel data processing. Major cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used stem from the field of cluster computing and disregard the particular nature of a cloud. As a result, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our ongoing research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's compute clouds for both, task scheduling and execution. It allows assigning the particular tasks of a processing job to different types of virtual machines and takes care of their instantiation and termination during the job execution. Based on this new framework, we perform evaluations on a compute cloud system and compare the results to the existing data processing framework Hadoop.