Massively-parallel stream processing under QoS constraints with Nephele

Authors:
Björn Lohrmann;Daniel Warneke;Odej Kao
Affiliations:
Technische Universität Berlin, Berlin, Germany;Technische Universität Berlin, Berlin, Germany;Technische Universität Berlin, Berlin, Germany
Venue:
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Year:
2012

Citing 12
Cited 2

A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
Continuous queries over data streams

ACM SIGMOD Record
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing

Proceedings of the 1st ACM symposium on Cloud computing
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Experience in Continuous analytics as a Service (CaaaS)

Proceedings of the 14th International Conference on Extending Database Technology
CIEL: a universal execution engine for distributed data-flow computing

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud

IEEE Transactions on Parallel and Distributed Systems
Hyracks: A flexible and extensible foundation for data-intensive computing

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering

Active data: a data-centric approach to data life-cycle management

PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop
Nephele streaming: stream processing under QoS constraints at scale

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, a growing number of commodity devices, like mobile phones or smart meters, is equipped with rich sensors and capable of producing continuous data streams. The sheer amount of these devices and the resulting overall data volumes of the streams raise new challenges with respect to the scalability of existing stream processing systems. At the same time, massively-parallel data processing systems like MapReduce have proven that they scale to large numbers of nodes and efficiently organize data transfers between them. Many of these systems also provide streaming capabilities. However, unlike traditional stream processors, these systems have disregarded QoS requirements of prospective stream processing applications so far. In this paper we address this gap. First, we analyze common design principles of today's parallel data processing frameworks and identify those principles that provide degrees of freedom in trading off the QoS goals latency and throughput. Second, we propose a scheme which allows these frameworks to detect violations of user-defined latency constraints and optimize the job execution without manual interaction in order to meet these constraints while keeping the throughput as high as possible. As a proof of concept, we implemented our approach for our parallel data processing framework Nephele and evaluated its effectiveness through a comparison with Hadoop Online. For a multimedia streaming application we can demonstrate an improved processing latency by factor of at least 15 while preserving high data throughput when needed.