Flood: elastic streaming MapReduce

Authors:
David Alves;Pedro Bizarro;Paulo Marques
Affiliations:
CISUC/University of Coimbra;CISUC/University of Coimbra;CISUC/University of Coimbra
Venue:
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Year:
2010

Citing 4
Cited 1

Cluster I/O with River: making the fast case common

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Ad-hoc data processing in the cloud

Proceedings of the VLDB Endowment

Elastic complex event processing

Proceedings of the 8th Middleware Doctoral Symposium

Quantified Score

Hi-index	0.01

Visualization

Abstract

Distributed data stream processing (DSP) is used to analyze information and raise alarms in business-critical scenarios such as financial fraud-detection, clickstream processing, network security, traffic control, or real-time KPI computations. Processing this information efficiently is very challenging because the nature of continuous streaming sources is varying in nature: often the amount of data and processing changes with time of day and day of week and frequently has unexpected spikes. Thus, the result is that most DSP computations are either over-provisioned, introducing increased cost and wasted energy, or are under-provisioned and, either incur in performance degradation or denial-of-service, or have to resort to load shedding. We demonstrate Flood, a scalable, elastic DSP engine that addresses these problems. By using a scalable computing model, MapReduce, and adequately monitoring running computations our system is able to decide, in runtime, if there is a lack or a surplus of resources. Flood then acts autonomically by requesting or releasing computing nodes, without losing tuples or redoing computation, at the same time making sure that latency and throughput requirements are guaranteed.