Workload characterization for operator-based distributed stream processing applications

Authors:
Xiaolan J. Zhang;Sujay Parekh;Bugra Gedik;Henrique Andrade;Kun-Lung Wu
Affiliations:
University of Illinois, Urbana-Champaign, IL;IBM T.J. Watson Research Center, Hawthorne, NY;IBM T.J. Watson Research Center, Hawthorne, NY;IBM T.J. Watson Research Center, Hawthorne, NY;IBM T.J. Watson Research Center, Hawthorne, NY
Venue:
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Year:
2010

Citing 19
Cited 3

Analysis of multithreaded architectures for parallel computing

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
System support for automatic profiling and optimization

Proceedings of the sixteenth ACM symposium on Operating systems principles
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Review of Performance Analysis Tools for MPI Parallel Programs

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Parallel program performance prediction using deterministic task graph analysis

ACM Transactions on Computer Systems (TOCS)
A Portable Programming Interface for Performance Evaluation on Modern Processors

International Journal of High Performance Computing Applications
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Distributed operation in the Borealis stream processing engine

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Measuring and improving application performance with PerfSuite

Linux Journal
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
SPC: a distributed, scalable platform for data mining

Proceedings of the 4th international workshop on Data mining standards, services and platforms
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems

Middleware '08 Proceedings of the ACM/IFIP/USENIX 9th International Middleware Conference
Scale-Up Strategies for Processing High-Rate Data Streams in System S

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A code generation approach to optimizing high-performance distributed data stream processing

Proceedings of the 18th ACM conference on Information and knowledge management
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org

Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing

Hirundo: a mechanism for automated production of optimized data stream graphs

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Generating synthetic task graphs for simulating stream computing systems

Journal of Parallel and Distributed Computing
A performance analysis of system s, s4, and esper via two level benchmarking

QEST'13 Proceedings of the 10th international conference on Quantitative Evaluation of Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Operator-based programming languages provide an effective development model for large scale stream processing applications. A stream processing application consists of many runtime deployable software processing elements (PE) that work in flows to process incoming messages. Operators (OP) are logical building blocks hosted by PEs. One or more OPs can be fused into a PE at compile-time. Performance optimization for our streaming system includes compile-time fusion optimization and runtime PE-to-host deployment. One of the goals of an optimized stream application is to use minimal computing resource to sustain maximal message throughput. Characterizing the resource usage of PEs is critical for performance optimization. During compile-time optimization, OP-level resource usage is used to predict the resource usage of fused PEs. When starting an application, PE-level resource usage is used as an initial estimation by the scheduler. In this paper, we propose an efficient workload characterization approach for data stream processing systems. Our method includes the procedures for obtaining reusable OP-level resource usage information from profiling data and recomposing OP-level profiles to predict PE-level resource usage. We present several techniques to overcome measurement errors from the OP data collection. The impact of hardware speed and multi-threading contention on hyper-threading and multi-core machines are also studied. We show that our method can be applied to several streaming applications and the prediction of the PE CPU resource usage is within 15% of the actual CPU usage.