Project Hoover: auto-scaling streaming map-reduce applications

Authors:
Rajalakshmi Ramesh;Liting Hu;Karsten Schwan
Affiliations:
Georgia Institute of Technology, Atlanta, USA;Georgia Institute of Technology, Atlanta, USA;Georgia Institute of Technology, Atlanta, USA
Venue:
Proceedings of the 2012 workshop on Management of big data systems
Year:
2012

Citing 4
Cited 0

Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
A flexible architecture integrating monitoring and analytics for managing large-scale data centers

Proceedings of the 8th ACM international conference on Autonomic computing
Scribe: a large-scale and decentralized application-level multicast infrastructure

IEEE Journal on Selected Areas in Communications
v-Bundle: Flexible Group Resource Offerings in Clouds

ICDCS '12 Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real-time data processing frameworks like S4 and Flume have become scalable and reliable solutions for acquiring, moving, and processing voluminous amounts of data continuously produced by large numbers of online sources. Yet these frameworks lack the elasticity to horizontally scale-up or scale-down their based on current rates of input events and desired event processing latencies. The Project Hoover middleware provides distributed methods for measuring, aggregating, and analyzing the performance of distributed Flume components, thereby enabling online configuration changes to meet varying processing demands. Experimental evaluations with a sample Flume data processing code show Hoover's approach to be capable of dynamically and continuously monitoring Flume performance, demonstrating that such data can be used to right-size the number of Flume collectors according to different log production rates.