Adaptive online scheduling in storm

Authors:
Leonardo Aniello;Roberto Baldoni;Leonardo Querzoni
Affiliations:
Sapienza Università di Roma, Roma, Italy;Sapienza Università di Roma, Roma, Italy;Sapienza Università di Roma, Roma, Italy
Venue:
Proceedings of the 7th ACM international conference on Distributed event-based systems
Year:
2013

Citing 24
Cited 1

Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
An Improved Duplication Strategy for Scheduling Precedence Constrained Graphs in Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Online Choice of Active Learning Algorithms

The Journal of Machine Learning Research
Dynamic Load Distribution in the Borealis Stream Processor

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Network-Aware Operator Placement for Stream-Processing Systems

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Providing resiliency to load variations in distributed stream processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Preemptive rate-based operator scheduling in a data stream management system

AICCSA '05 Proceedings of the ACS/IEEE 2005 International Conference on Computer Systems and Applications
SPC: a distributed, scalable platform for data mining

Proceedings of the 4th international workshop on Data mining standards, services and platforms
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems

Middleware '08 Proceedings of the ACM/IFIP/USENIX 9th International Middleware Conference
Flexible Multi-Threaded Scheduling for Continuous Queries over Data Streams

ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
The life and times of a zookeeper

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
A stratified approach for supporting high throughput event processing applications

Proceedings of the Third ACM International Conference on Distributed Event-Based Systems
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Feedback-directed pipeline parallelism

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Low-Overhead Fault Tolerance for High-Throughput Data Processing Systems

ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
A model for continuous query latencies in data streams

Proceedings of the First International Workshop on Algorithms and Models for Distributed Event Processing
Active Replication at (Almost) No Cost

SRDS '11 Proceedings of the 2011 IEEE 30th International Symposium on Reliable Distributed Systems
Scalable and Low-Latency Data Processing with Stream MapReduce

CLOUDCOM '11 Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Adaptive Class-Based Scheduling of Continuous Queries

ICDEW '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops
Input data organization for batch processing in time window based computations

Proceedings of the 28th Annual ACM Symposium on Applied Computing

An event-based platform for collaborative threats detection and monitoring

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today we are witnessing a dramatic shift toward a data-driven economy, where the ability to efficiently and timely analyze huge amounts of data marks the difference between industrial success stories and catastrophic failures. In this scenario Storm, an open source distributed realtime computation system, represents a disruptive technology that is quickly gaining the favor of big players like Twitter and Groupon. A Storm application is modeled as a topology, i.e. a graph where nodes are operators and edges represent data flows among such operators. A key aspect in tuning Storm performance lies in the strategy used to deploy a topology, i.e. how Storm schedules the execution of each topology component on the available computing infrastructure. In this paper we propose two advanced generic schedulers for Storm that provide improved performance for a wide range of application topologies. The first scheduler works offline by analyzing the topology structure and adapting the deployment to it; the second scheduler enhance the previous approach by continuously monitoring system performance and rescheduling the deployment at run-time to improve overall performance. Experimental results show that these algorithms can produce schedules that achieve significantly better performances compared to those produced by Storm's default scheduler.