Continuous queries over append-only databases
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
The p2d2 project: building a portable distributed debugger
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Continual Queries for Internet Scale Event-Driven Information Delivery
IEEE Transactions on Knowledge and Data Engineering
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Low-Cost Non-Intrusive Debugging Strategies for Distributed Parallel Programs
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The VLDB Journal — The International Journal on Very Large Data Bases
Net-dbx-G: a Web-based debugger of MPI programs over Grid environments
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Towards Autonomic Fault Recovery in System-S
ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
A debugger for flow graph based parallel applications
Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging
SPC: a distributed, scalable platform for data mining
Proceedings of the 4th international workshop on Data mining standards, services and platforms
GrubJoin: An Adaptive, Multi-Way, Windowed Stream Join with Time Correlation-Aware CPU Load Shedding
IEEE Transactions on Knowledge and Data Engineering
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Algorithms and metrics for processing multiple heterogeneous continuous queries
ACM Transactions on Database Systems (TODS)
SPADE: the system s declarative stream processing engine
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Streamsight: a visualization tool for large-scale streaming applications
Proceedings of the 4th ACM symposium on Software visualization
SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems
Middleware '08 Proceedings of the ACM/IFIP/USENIX 9th International Middleware Conference
Efficient Construction of Compact Shedding Filters for Data Stream Processing
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
XStream: a Signal-Oriented Data Stream Management System
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Scale-Up Strategies for Processing High-Rate Data Streams in System S
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Elastic scaling of data parallel operators in stream processing
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A code generation approach to optimizing high-performance distributed data stream processing
Proceedings of the 18th ACM conference on Information and knowledge management
Scaling applications to massively parallel machines using Projections performance analysis tool
Future Generation Computer Systems
Visualizing large-scale streaming applications
Information Visualization
Design principles for developing stream processing applications
Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
From a stream of relational queries to distributed stream processing
Proceedings of the VLDB Endowment
Visual debugging for stream processing applications
RV'10 Proceedings of the First international conference on Runtime verification
SpamWatcher: a streaming social network analytic on the IBM wire-speed processor
Proceedings of the 5th ACM international conference on Distributed event-based system
Software—Practice & Experience
A multi-level monitoring framework for stream-based coordination programs
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Hi-index | 0.00 |
Distributed data stream processing applications are often characterized by data flow graphs consisting of a large number of built-in and user-defined operators connected via streams. These flow graphs are typically deployed on a large set of nodes. The data processing is carried out on-the-fly, as tuples arrive at possibly very high rates, with minimum latency. It is well known that developing and debugging distributed, multi-threaded, and asynchronous applications, such as stream processing applications, can be challenging. Thus, without domain-specific debugging support, developers struggle when debugging distributed applications. In this paper, we describe tools and language support to support debugging distributed stream processing applications. Our key insight is to view debugging of stream processing applications from four different, but related, perspectives. First, debugging the semantics of the application involves verifying the operator-level composition and inspecting the flows at the logical level. Second, debugging the user-defined operators involves traditional source-code debugging, but strongly tied to the stream-level interactions. Third, debugging the deployment details of the application require understanding the runtime physical layout and configuration of the application. Fourth, debugging the performance of the application requires inspecting various performance metrics (such as communication rates, CPU utilization, etc.) associated with streams, operators, and nodes in the system. In light of this characterization, we developed several tools such as a debugger-aware compiler and an associated stream debugger, composition and deployment visualizers, and performance visualizers, as well as language support, such as configuration knobs for logging and tracing, deployment configurations such as operator-to-process and process-to-node mappings, monitoring directives to inspect streams, and special sink adapters to intercept and dump streaming data to files and sockets, to name a few. We describe these tools in the context of Spade—a language for creating distributed stream processing applications, and System S—a distributed stream processing middleware under development at the IBM Watson Research Center. Published in 2009 by John Wiley & Sons, Ltd. This article is a U.S. Government work and is in the public domain in the U.S.A.