Modeling performance of a parallel streaming engine: bridging theory and costs

  • Authors:
  • Ivan Bedini;Sherif Sakr;Bart Theeten;Alessandra Sala;Peter Cogan

  • Affiliations:
  • Bell Labs, Alcatel-Lucent, Dublin, Ireland;Bell Labs, Alcatel-Lucent, Dublin, Ireland;Bell Labs, Alcatel-Lucent, Antwerp, Belgium;Bell Labs, Alcatel-Lucent, Dublin , Ireland;Bell Labs, Alcatel-Lucent, Dublin , Ireland

  • Venue:
  • Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

While data are growing at a speed never seen before, parallel computing is becoming more and more essential to process this massive volume of data in a timely manner. Therefore, recently, concurrent computations have been receiving increasing attention due to the widespread adoption of multi-core processors and the emerging advancements of cloud computing technology. The ubiquity of mobile devices, location services, and sensor pervasiveness are examples of new scenarios that have created the crucial need for building scalable computing platforms and parallel architectures to process vast amounts of generated streaming data. In practice, efficiently operating these systems is hard due to the intrinsic complexity of these architectures and the lack of a formal and in-depth knowledge of the performance models and the consequent system costs. The Actor Model theory has been presented as a mathematical model of con- current computation that had enormous success in practice and inspired a number of contemporary work in this area. Recently, the Storm system has been presented as a realization of the principles of the Actor Model theory in the context of the large scale processing of streaming data. In this paper, we present, to the best of our knowledge, the first set of models that formalize the performance characteristics of a practical distributed, parallel and fault-tolerant stream processing system that follows the Actor Model theory. In particular, we model the characteristics of the data flow, the data processing and the system management costs at a fine granularity within the different steps of executing a distributed stream processing job. Finally, we present an experimental validation of the described performance models using the Storm system.