IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Online Choice of Active Learning Algorithms
The Journal of Machine Learning Research
Dynamic Load Distribution in the Borealis Stream Processor
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Network-Aware Operator Placement for Stream-Processing Systems
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Providing resiliency to load variations in distributed stream processing
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Preemptive rate-based operator scheduling in a data stream management system
AICCSA '05 Proceedings of the ACS/IEEE 2005 International Conference on Computer Systems and Applications
SPC: a distributed, scalable platform for data mining
Proceedings of the 4th international workshop on Data mining standards, services and platforms
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems
Middleware '08 Proceedings of the ACM/IFIP/USENIX 9th International Middleware Conference
Flexible Multi-Threaded Scheduling for Continuous Queries over Data Streams
ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
The life and times of a zookeeper
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
A stratified approach for supporting high throughput event processing applications
Proceedings of the Third ACM International Conference on Distributed Event-Based Systems
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Feedback-directed pipeline parallelism
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
S4: Distributed Stream Computing Platform
ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Low-Overhead Fault Tolerance for High-Throughput Data Processing Systems
ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
A model for continuous query latencies in data streams
Proceedings of the First International Workshop on Algorithms and Models for Distributed Event Processing
Active Replication at (Almost) No Cost
SRDS '11 Proceedings of the 2011 IEEE 30th International Symposium on Reliable Distributed Systems
Scalable and Low-Latency Data Processing with Stream MapReduce
CLOUDCOM '11 Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science
Hadoop: The Definitive Guide
Adaptive Class-Based Scheduling of Continuous Queries
ICDEW '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops
Input data organization for batch processing in time window based computations
Proceedings of the 28th Annual ACM Symposium on Applied Computing
An event-based platform for collaborative threats detection and monitoring
Information Systems
Hi-index | 0.00 |
Today we are witnessing a dramatic shift toward a data-driven economy, where the ability to efficiently and timely analyze huge amounts of data marks the difference between industrial success stories and catastrophic failures. In this scenario Storm, an open source distributed realtime computation system, represents a disruptive technology that is quickly gaining the favor of big players like Twitter and Groupon. A Storm application is modeled as a topology, i.e. a graph where nodes are operators and edges represent data flows among such operators. A key aspect in tuning Storm performance lies in the strategy used to deploy a topology, i.e. how Storm schedules the execution of each topology component on the available computing infrastructure. In this paper we propose two advanced generic schedulers for Storm that provide improved performance for a wide range of application topologies. The first scheduler works offline by analyzing the topology structure and adapting the deployment to it; the second scheduler enhance the previous approach by continuously monitoring system performance and rescheduling the deployment at run-time to improve overall performance. Experimental results show that these algorithms can produce schedules that achieve significantly better performances compared to those produced by Storm's default scheduler.