Automatic optimization of stream programs via source program operator graph transformations

  • Authors:
  • Miyuru Dayarathna;Toyotaro Suzumura

  • Affiliations:
  • Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan 152-8552;Department of Computer Science, Tokyo Institute of Technology/IBM Research-Tokyo, Tokyo, Japan 152-8552

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed data stream processing is a data analysis paradigm where massive amounts of data produced by various sources are analyzed online within real-time constraints. Execution performance of a stream program/query executed on such middleware is largely dependent on the ability of the programmer to fine tune the program to match the topology of the stream processing system. However, manual fine tuning of a stream program is a very difficult, error prone process that demands huge amounts of programmer time and expertise which are expensive to obtain. We describe an automated process for stream program performance optimization that uses semantic preserving automatic code transformation to improve stream processing job performance. We first identify the structure of the input program and represent the program structure in a Directed Acyclic Graph. We transform the graph using the concepts of Tri-OP Transformation and Bi-Op Transformation. The resulting sample program space is pruned using both empirical as well as profiling information to obtain a ranked list of sample programs which have higher performance compared to their parent program. We successfully implemented this methodology on a prototype stream program performance optimization mechanism called Hirundo. The mechanism has been developed for optimizing SPADE programs which run on System S stream processing run-time. Using five real world applications (called VWAP, CDR, Twitter, Apnoea, and Bargain) we show the effectiveness of our approach. Hirundo was able to identify a 31.1 times higher performance version of the CDR application within seven minutes time on a cluster of 4 nodes.