Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Principles of database and knowledge-base systems, Vol. I
Principles of database and knowledge-base systems, Vol. I
Querying very large multi-dimensional datasets in ADR
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
The VLDB Journal — The International Journal on Very Large Data Bases
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Expressing and exploiting concurrency in networked applications with aspen
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Towards Autonomic Fault Recovery in System-S
ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
SPC: a distributed, scalable platform for data mining
Proceedings of the 4th international workshop on Data mining standards, services and platforms
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SPADE: the system s declarative stream processing engine
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
XStream: a Signal-Oriented Data Stream Management System
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Scale-Up Strategies for Processing High-Rate Data Streams in System S
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Tools and strategies for debugging distributed stream processing applications
Software—Practice & Experience
COLA: optimizing stream processing applications via graph partitioning
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
COLA: optimizing stream processing applications via graph partitioning
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Workload characterization for operator-based distributed stream processing applications
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Design principles for developing stream processing applications
Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Processing high data rate streams in System S
Journal of Parallel and Distributed Computing
From a stream of relational queries to distributed stream processing
Proceedings of the VLDB Endowment
Fault injection-based assessment of partial fault tolerance in stream processing applications
Proceedings of the 5th ACM international conference on Distributed event-based system
SpamWatcher: a streaming social network analytic on the IBM wire-speed processor
Proceedings of the 5th ACM international conference on Distributed event-based system
Hirundo: a mechanism for automated production of optimized data stream graphs
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Highly scalable speech processing on data stream management system
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Software—Practice & Experience
Automatic optimization of stream programs via source program operator graph transformations
Distributed and Parallel Databases
Hi-index | 0.00 |
We present a code-generation-based optimization approach to bringing performance and scalability to distributed stream processing applications. We express stream processing applications using an operator-based, stream-centric language called SPADE, which supports composing distributed data flow graphs out of toolkits of type-generic operators. A major challenge in building such applications is to find an effective and flexible way of mapping the logical graph of operators into a physical one that can be deployed on a set of distributed nodes. This involves finding how best operators map to processes and how best processes map to computing nodes. In this paper, we take a two-stage optimization approach, where an instrumented version of the application is first generated by the SPADE compiler to profile and collect statistics about the processing and communication characteristics of the operators within the application. In the second stage, the profiling information is fed to an optimizer to come up with a physical data flow graph that is deployable across nodes in a computing cluster. This approach not only creates highly optimized applications that are tailored to the underlying computing and networking infrastructure, but also makes it possible to re-target the application to a different hardware setup by simply repeating the optimization step and re-compiling the application to match the physical flow graph produced by the optimizer. Using real-world applications, from diverse domains such as finance and radio-astronomy, we demonstrate the effectiveness of our approach on System S -- a large-scale, distributed stream processing platform.