Input data organization for batch processing in time window based computations

  • Authors:
  • Leonardo Aniello;Leonardo Querzoni;Roberto Baldoni

  • Affiliations:
  • University of Rome "La Sapienza", Rome, Italy;University of Rome "La Sapienza", Rome, Italy;University of Rome "La Sapienza", Rome, Italy

  • Venue:
  • Proceedings of the 28th Annual ACM Symposium on Applied Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Applications based on event processing are often designed to continuously evaluate set of events defined by sliding time windows. Solutions employing long-running continuous queries executed in-memory show their limits in applications characterized by a staggering growth of available sources that continuously produce new events at high rates (e.g. intrusion detection systems and algorithmic trading). Problems arise due to the complexities in maintaining large amounts of events in memory for continuous elaboration, and due to the difficulties in managing at run-time the network of elaborating nodes. A batch approach to this kind of computation provides a viable solution for scenarios characterized by non frequent computations of very large time windows. In this paper we propose a model for batch processing in time window event computations that allows the definition of multiple metrics for performance optimization. These metrics specifically take into account the organization of input data to minimize its impact on computation latency. The model is then instantiated on Hadoop, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated.