Automatic optimization of stream programs via source program operator graph transformations

Authors:
Miyuru Dayarathna;Toyotaro Suzumura
Affiliations:
Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan 152-8552;Department of Computer Science, Tokyo Institute of Technology/IBM Research-Tokyo, Tokyo, Japan 152-8552
Venue:
Distributed and Parallel Databases
Year:
2013

Citing 39
Cited 0

Efficient mid-query re-optimization of sub-optimal query execution plans

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Graph theory for programmers: algorithms for processing trees

Graph theory for programmers: algorithms for processing trees
Compiler optimization of dynamic data distributions for distributed-memory multicomputers

Compiler optimizations for scalable parallel systems
Foundations of genetic programming

Foundations of genetic programming
Data Mining and Knowledge Discovery with Evolutionary Algorithms

Data Mining and Knowledge Discovery with Evolutionary Algorithms
Modern Compiler Implementation in Java

Modern Compiler Implementation in Java
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
Node listings for reducible flow graphs

STOC '75 Proceedings of seventh annual ACM symposium on Theory of computing
Duplicate detection in click streams

WWW '05 Proceedings of the 14th international conference on World Wide Web
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Cost-based query transformation in Oracle

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Performance prediction with skeletons

Cluster Computing
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Event detection in sensor networks for modern oil fields

Proceedings of the second international conference on Distributed event-based systems
The Algorithm Design Manual

The Algorithm Design Manual
Scale-Up Strategies for Processing High-Rate Data Streams in System S

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A novel graph based approach for automatic composition of high quality grid workflows

Proceedings of the 18th ACM international symposium on High performance distributed computing
Machine Learning: An Algorithmic Perspective

Machine Learning: An Algorithmic Perspective
A code generation approach to optimizing high-performance distributed data stream processing

Proceedings of the 18th ACM conference on Information and knowledge management
Enhanced subquery optimizations in Oracle

Proceedings of the VLDB Endowment
Towards automatic optimization of MapReduce programs

Proceedings of the 1st ACM symposium on Cloud computing
IBM infosphere streams for scalable, real-time, intelligent transportation services

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
COLA: optimizing stream processing applications via graph partitioning

Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Scalable performance of system S for extract-transform-load processing

Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Towards optimising distributed data streaming graphs using parallel streams

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Partitioning streaming parallelism for multi-cores: a machine learning based approach

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
A framework to model and translate clinical rules to support complex real-time analysis of physiological and clinical data

Proceedings of the 1st ACM International Health Informatics Symposium
Federated enactment of workflow patterns

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Query optimization techniques for partitioned tables

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Run-time automatic performance tuning for multicore applications

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Loop transformation recipes for code generation and auto-tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Automatically tuning parallel and parallelized programs

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Managing parallelism for stream processing in the cloud

Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing
Hirundo: a mechanism for automated production of optimized data stream graphs

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Processing 6 billion CDRs/day: from research to production (experience report)

Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Understanding and improving the cost of scaling distributed event processing

Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Evaluation of a high-volume, low-latency market data processing system implemented with IBM middleware

Software—Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed data stream processing is a data analysis paradigm where massive amounts of data produced by various sources are analyzed online within real-time constraints. Execution performance of a stream program/query executed on such middleware is largely dependent on the ability of the programmer to fine tune the program to match the topology of the stream processing system. However, manual fine tuning of a stream program is a very difficult, error prone process that demands huge amounts of programmer time and expertise which are expensive to obtain. We describe an automated process for stream program performance optimization that uses semantic preserving automatic code transformation to improve stream processing job performance. We first identify the structure of the input program and represent the program structure in a Directed Acyclic Graph. We transform the graph using the concepts of Tri-OP Transformation and Bi-Op Transformation. The resulting sample program space is pruned using both empirical as well as profiling information to obtain a ranked list of sample programs which have higher performance compared to their parent program. We successfully implemented this methodology on a prototype stream program performance optimization mechanism called Hirundo. The mechanism has been developed for optimizing SPADE programs which run on System S stream processing run-time. Using five real world applications (called VWAP, CDR, Twitter, Apnoea, and Bargain) we show the effectiveness of our approach. Hirundo was able to identify a 31.1 times higher performance version of the CDR application within seven minutes time on a cluster of 4 nodes.