Flood: elastic streaming MapReduce

  • Authors:
  • David Alves;Pedro Bizarro;Paulo Marques

  • Affiliations:
  • CISUC/University of Coimbra;CISUC/University of Coimbra;CISUC/University of Coimbra

  • Venue:
  • Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Distributed data stream processing (DSP) is used to analyze information and raise alarms in business-critical scenarios such as financial fraud-detection, clickstream processing, network security, traffic control, or real-time KPI computations. Processing this information efficiently is very challenging because the nature of continuous streaming sources is varying in nature: often the amount of data and processing changes with time of day and day of week and frequently has unexpected spikes. Thus, the result is that most DSP computations are either over-provisioned, introducing increased cost and wasted energy, or are under-provisioned and, either incur in performance degradation or denial-of-service, or have to resort to load shedding. We demonstrate Flood, a scalable, elastic DSP engine that addresses these problems. By using a scalable computing model, MapReduce, and adequately monitoring running computations our system is able to decide, in runtime, if there is a lack or a surplus of resources. Flood then acts autonomically by requesting or releasing computing nodes, without losing tuples or redoing computation, at the same time making sure that latency and throughput requirements are guaranteed.