SPADE: the system s declarative stream processing engine

Authors:
Bugra Gedik;Henrique Andrade;Kun-Lung Wu;Philip S. Yu;Myungcheol Doo
Affiliations:
IBM Thomas J. Watson Research Center, Hawthorne, NY, USA;IBM Thomas J. Watson Research Center, Hawthorne, NY, USA;IBM Thomas J. Watson Research Center, Hawthorne, NY, USA;University of Illinois at Chicago, Chicago, IL, USA;Georgia Institute of Technology, Atlanta, GA, USA
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 8
Cited 108

Retrospective on Aurora

The VLDB Journal — The International Journal on Very Large Data Bases
GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems)

GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems)
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
SPC: a distributed, scalable platform for data mining

Proceedings of the 4th international workshop on Data mining standards, services and platforms
Executing stream joins on the cell processor

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on System S

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
CellSort: high performance sorting on the cell processor

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XStream: a Signal-Oriented Data Stream Management System

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Streamsight: a visualization tool for large-scale streaming applications

Proceedings of the 4th ACM symposium on Software visualization
Embedding intelligent decision making within complex dynamic environments

Artificial Intelligence Review
Configuring topologies of distributed semantic concept classifiers for continuous multimedia stream processing

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Auto-vectorization through code generation for stream processing applications

Proceedings of the 23rd international conference on Supercomputing
Distributed event stream processing with non-deterministic finite automata

Proceedings of the Third ACM International Conference on Distributed Event-Based Systems
NexusDS: a flexible and extensible middleware for distributed stream processing

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Accelerating the creation of customized, language-Specific IDEs in Eclipse

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Operational BI platform for video analytics

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Implementing a high-volume, low-latency market data processing system on commodity hardware using IBM middleware

Proceedings of the 2nd Workshop on High Performance Computational Finance
Mashup-based information retrieval for domain experts

Proceedings of the 18th ACM conference on Information and knowledge management
A code generation approach to optimizing high-performance distributed data stream processing

Proceedings of the 18th ACM conference on Information and knowledge management
Characterizing, constructing and managing resource usage profiles of system S applications: challenges and experience

Proceedings of the 18th ACM conference on Information and knowledge management
The design of distributed real-time video analytic system

Proceedings of the first international workshop on Cloud data management
Towards secure dataflow processing in open distributed systems

Proceedings of the 2009 ACM workshop on Scalable trusted computing
Tools and strategies for debugging distributed stream processing applications

Software—Practice & Experience
COLA: optimizing stream processing applications via graph partitioning

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
MARIO: middleware for assembly and deployment of multi-platform flow-based applications

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
DBToaster: a SQL compiler for high-performance delta processing in main-memory databases

Proceedings of the VLDB Endowment
A flexible framework for multisensor data fusion using data stream management technologies

Proceedings of the 2009 EDBT/ICDT Workshops
Roots of publication delay

Communications of the ACM
DEDUCE: at the intersection of MapReduce and stream processing

Proceedings of the 13th International Conference on Extending Database Technology
Body sensor data processing using stream computing

Proceedings of the international conference on Multimedia information retrieval
Adaptive sized windows to improve real-time health monitoring: a case study on heart attack prediction

Proceedings of the international conference on Multimedia information retrieval
RunTest: assuring integrity of dataflow processing in cloud computing infrastructures

ASIACCS '10 Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security
Visualizing large-scale streaming applications

Information Visualization
IBM infosphere streams for scalable, real-time, intelligent transportation services

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
COLA: optimizing stream processing applications via graph partitioning

Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Scalable performance of system S for extract-transform-load processing

Proceedings of the 3rd Annual Haifa Experimental Systems Conference
StreamNetFlux: birth of transparent integrated CEP-DBs

Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Experiences with codifying event processing function patterns

Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Workload characterization for operator-based distributed stream processing applications

Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Evaluation of streaming aggregation on parallel hardware architectures

Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Adaptive system anomaly prediction for large-scale hosting infrastructures

Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Complex real-time environmental monitoring of the Hudson river and estuary system

IBM Journal of Research and Development
On verifying stateful dataflow processing services in large-scale cloud systems

Proceedings of the 17th ACM conference on Computer and communications security
i-SEE: integrated stream execution environment over on-line data streams

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Identifying the challenges for optimizing the process to achieve reproducible results in e-science applications

PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
Experience in extending query engine for continuous analytics

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Scale out parallel and distributed CDR stream analytics

Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
Design principles for developing stream processing applications

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Processing high data rate streams in System S

Journal of Parallel and Distributed Computing
SECRET: a model for analysis of the execution semantics of stream processing systems

Proceedings of the VLDB Endowment
From a stream of relational queries to distributed stream processing

Proceedings of the VLDB Endowment
Data stream analytics as cloud service for mobile applications

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
Visual debugging for stream processing applications

RV'10 Proceedings of the First international conference on Runtime verification
Toward an integrative software infrastructure for water management in the smarter planet

IBM Journal of Research and Development
Continuous mapreduce for In-DB stream analytics

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Experience in Continuous analytics as a Service (CaaaS)

Proceedings of the 14th International Conference on Extending Database Technology
Adaptive data-driven service integrity attestation for multi-tenant cloud systems

Proceedings of the Nineteenth International Workshop on Quality of Service
Blending OLAP processing with real-time data streams

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Scalable heterogeneous parallelism for atmospheric modeling and simulation

The Journal of Supercomputing
Fault injection-based assessment of partial fault tolerance in stream processing applications

Proceedings of the 5th ACM international conference on Distributed event-based system
SpamWatcher: a streaming social network analytic on the IBM wire-speed processor

Proceedings of the 5th ACM international conference on Distributed event-based system
Distributed middleware reliability and fault tolerance support in system S

Proceedings of the 5th ACM international conference on Distributed event-based system
Flow: A Stream Processing System Simulator

PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
A model for continuous query latencies in data streams

Proceedings of the First International Workshop on Algorithms and Models for Distributed Event Processing
PAL: Propagation-aware Anomaly Localization for cloud hosted distributed applications

SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
CloudScale: elastic resource scaling for multi-tenant cloud systems

Proceedings of the 2nd ACM Symposium on Cloud Computing
Geostreaming in cloud

Proceedings of the 2nd ACM SIGSPATIAL International Workshop on GeoStreaming
Real-time route planning with stream processing systems: a case study for the city of Lucerne

Proceedings of the 2nd ACM SIGSPATIAL International Workshop on GeoStreaming
SQL streaming process in query engine net

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part I
Continuous access to cloud event services with event pipe queries

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
M-TOP: multi-target operator placement of query graphs for data streams

Proceedings of the 15th Symposium on International Database Engineering & Applications
Stream computing based synchrophasor application for power grids

Proceedings of the first international workshop on High performance computing, networking and analytics for the power grid
A code generation approach for auto-vectorization in the SPADE compiler

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
A universal calculus for stream processing languages

ESOP'10 Proceedings of the 19th European conference on Programming Languages and Systems
Processing flows of information: From data stream to complex event processing

ACM Computing Surveys (CSUR)
Hirundo: a mechanism for automated production of optimized data stream graphs

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Evaluating test selection strategies for end-user specified flow-based applications

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Large-Scale DNA sequence analysis in the cloud: a stream-based approach

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
StreamX10: a stream programming framework on X10

Proceedings of the 2012 ACM SIGPLAN X10 Workshop
Highly scalable speech processing on data stream management system

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
From a calculus to an execution environment for stream processing

Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Partition and compose: parallel complex event processing

Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
CAPSULE: language and system support for efficient state sharing in distributed stream processing systems

Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Parallelizing stateful operators in a distributed stream processing system: how, should you and how much?

Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Understanding and improving the cost of scaling distributed event processing

Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Evaluation of a high-volume, low-latency market data processing system implemented with IBM middleware

Software—Practice & Experience
Towards flexible exascale stream processing system simulation

Simulation
Stream-join revisited in the context of epoch-based SQL continuous query

Proceedings of the 16th International Database Engineering & Applications Sysmposium
Muppet: MapReduce-style processing of fast data

Proceedings of the VLDB Endowment
UBL: unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems

Proceedings of the 9th international conference on Autonomic computing
Data-intensive architecture for scientific knowledge discovery

Distributed and Parallel Databases
A model-based framework for building extensible, high performance stream processing middleware and programming language for IBM InfoSphere Streams

Software—Practice & Experience
A security aware stream data processing scheme on the cloud and its efficient execution methods

Proceedings of the fourth international workshop on Cloud data management
Type 2 slowly changing dimensions: a case study using the cooperating system

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Sigma*: symbolic learning of input-output specifications

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Streaming workload generator for testing billing mediation platform in telecom industry

Proceedings of the Winter Simulation Conference
VScope: middleware for troubleshooting time-sensitive data center applications

Proceedings of the 13th International Middleware Conference
Pipelining for cyclic control systems

Proceedings of the 16th international conference on Hybrid systems: computation and control
Fast data in the era of big data: Twitter's real-time related query suggestion architecture

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Modeling performance of a parallel streaming engine: bridging theory and costs

Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Continuous query processing with concurrency control: reading updatable resources consistently

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Making every bit count in wide-area analytics

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Modeling the execution semantics of stream processing engines with SECRET

The VLDB Journal — The International Journal on Very Large Data Bases
Data stream processing with concurrency control

ACM SIGAPP Applied Computing Review
Memory-efficient groupby-aggregate using compressed buffer trees

Proceedings of the 4th annual Symposium on Cloud Computing
Student attendance reporting prototype using SSQL

Journal of Computing Sciences in Colleges
Generating synthetic task graphs for simulating stream computing systems

Journal of Parallel and Distributed Computing
A catalog of stream processing optimizations

ACM Computing Surveys (CSUR)
Fusing Traffic Sensor Data for Real-time Road Conditions

Proceedings of First International Workshop on Sensing and Big Data Mining
Flexible filters in stream programs

ACM Transactions on Embedded Computing Systems (TECS)
Automatic optimization of stream programs via source program operator graph transformations

Distributed and Parallel Databases
An event-based platform for collaborative threats detection and monitoring

Information Systems
Research issues in outlier detection for data streams

ACM SIGKDD Explorations Newsletter
Trends and outlook for the massive-scale analytics stack

IBM Journal of Research and Development
IBM streams processing language: analyzing big data in motion

IBM Journal of Research and Development
Real-time analysis and management of big time-series data

IBM Journal of Research and Development

Quantified Score

Hi-index	0.02

Visualization

Abstract

In this paper, we present Spade - the System S declarative stream processing engine. System S is a large-scale, distributed data stream processing middleware under development at IBM T. J. Watson Research Center. As a front-end for rapid application development for System S, Spade provides (1) an intermediate language for flexible composition of parallel and distributed data-flow graphs, (2) a toolkit of type-generic, built-in stream processing operators, that support scalar as well as vectorized processing and can seamlessly inter-operate with user-defined operators, and (3) a rich set of stream adapters to ingest/publish data from/to outside sources. More importantly, Spade automatically brings performance optimization and scalability to System S applications. To that end, Spade employs a code generation framework to create highly-optimized applications that run natively on the Stream Processing Core (SPC), the execution and communication substrate of System S, and take full advantage of other System S services. Spade allows developers to construct their applications with fine granular stream operators without worrying about the performance implications that might exist, even in a distributed system. Spade's optimizing compiler automatically maps applications into appropriately sized execution units in order to minimize communication overhead, while at the same time exploiting available parallelism. By virtue of the scalability of the System S runtime and Spade's effective code generation and optimization, we can scale applications to a large number of nodes. Currently, we can run Spade jobs on ≈ 500 processors within more than 100 physical nodes in a tightly connected cluster environment. Spade has been in use at IBM Research to create real-world streaming applications, ranging from monitoring financial market feeds to radio telescopes to semiconductor fabrication lines.