Analysis of multithreaded architectures for parallel computing
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Review of Performance Analysis Tools for MPI Parallel Programs
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
TelegraphCQ: continuous dataflow processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Parallel program performance prediction using deterministic task graph analysis
ACM Transactions on Computer Systems (TOCS)
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Distributed operation in the Borealis stream processing engine
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
SPC: a distributed, scalable platform for data mining
Proceedings of the 4th international workshop on Data mining standards, services and platforms
SPADE: the system s declarative stream processing engine
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems
Middleware '08 Proceedings of the ACM/IFIP/USENIX 9th International Middleware Conference
Scale-Up Strategies for Processing High-Rate Data Streams in System S
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A code generation approach to optimizing high-performance distributed data stream processing
Proceedings of the 18th ACM conference on Information and knowledge management
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Hirundo: a mechanism for automated production of optimized data stream graphs
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Generating synthetic task graphs for simulating stream computing systems
Journal of Parallel and Distributed Computing
A performance analysis of system s, s4, and esper via two level benchmarking
QEST'13 Proceedings of the 10th international conference on Quantitative Evaluation of Systems
Hi-index | 0.01 |
Operator-based programming languages provide an effective development model for large scale stream processing applications. A stream processing application consists of many runtime deployable software processing elements (PE) that work in flows to process incoming messages. Operators (OP) are logical building blocks hosted by PEs. One or more OPs can be fused into a PE at compile-time. Performance optimization for our streaming system includes compile-time fusion optimization and runtime PE-to-host deployment. One of the goals of an optimized stream application is to use minimal computing resource to sustain maximal message throughput. Characterizing the resource usage of PEs is critical for performance optimization. During compile-time optimization, OP-level resource usage is used to predict the resource usage of fused PEs. When starting an application, PE-level resource usage is used as an initial estimation by the scheduler. In this paper, we propose an efficient workload characterization approach for data stream processing systems. Our method includes the procedures for obtaining reusable OP-level resource usage information from profiling data and recomposing OP-level profiles to predict PE-level resource usage. We present several techniques to overcome measurement errors from the OP data collection. The impact of hardware speed and multi-threading contention on hyper-threading and multi-core machines are also studied. We show that our method can be applied to several streaming applications and the prediction of the PE CPU resource usage is within 15% of the actual CPU usage.