Cluster I/O with River: making the fast case common
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Ad-hoc data processing in the cloud
Proceedings of the VLDB Endowment
Elastic complex event processing
Proceedings of the 8th Middleware Doctoral Symposium
Hi-index | 0.01 |
Distributed data stream processing (DSP) is used to analyze information and raise alarms in business-critical scenarios such as financial fraud-detection, clickstream processing, network security, traffic control, or real-time KPI computations. Processing this information efficiently is very challenging because the nature of continuous streaming sources is varying in nature: often the amount of data and processing changes with time of day and day of week and frequently has unexpected spikes. Thus, the result is that most DSP computations are either over-provisioned, introducing increased cost and wasted energy, or are under-provisioned and, either incur in performance degradation or denial-of-service, or have to resort to load shedding. We demonstrate Flood, a scalable, elastic DSP engine that addresses these problems. By using a scalable computing model, MapReduce, and adequately monitoring running computations our system is able to decide, in runtime, if there is a lack or a surplus of resources. Flood then acts autonomically by requesting or releasing computing nodes, without losing tuples or redoing computation, at the same time making sure that latency and throughput requirements are guaranteed.