Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on System S

Authors:
Kun-Lung Wu;Kirsten W. Hildrum;Wei Fan;Philip S. Yu;Charu C. Aggarwal;David A. George;Buǧra Gedik;Eric Bouillet;Xiaohui Gu;Gang Luo;Haixun Wang
Affiliations:
IBM T. J. Watson Research Center, Hawthorne, NY;IBM T. J. Watson Research Center, Hawthorne, NY;IBM T. J. Watson Research Center, Hawthorne, NY;IBM T. J. Watson Research Center, Hawthorne, NY;IBM T. J. Watson Research Center, Hawthorne, NY;IBM T. J. Watson Research Center, Hawthorne, NY;IBM T. J. Watson Research Center, Hawthorne, NY;IBM T. J. Watson Research Center, Hawthorne, NY;IBM T. J. Watson Research Center, Hawthorne, NY;IBM T. J. Watson Research Center, Hawthorne, NY;IBM T. J. Watson Research Center, Hawthorne, NY
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 13
Cited 38

Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
STREAM: the stanford stream data manager (demonstration description)

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
On demand classification of data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive load shedding for windowed stream joins

Proceedings of the 14th ACM international conference on Information and knowledge management
Effective Estimation of Posterior Probabilities: Explaining the Accuracy of Randomized Decision Tree Approaches

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Focused Community Discovery

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Query indexing with containment-encoded intervals for efficient stream processing

Knowledge and Information Systems
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A general framework for accurate and fast regression by data summarization in random decision trees

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
SWORD: scalable and flexible workload generator for distributed data processing systems

Proceedings of the 38th conference on Winter simulation
Resource-adaptive real-time new event detection

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Adaptive load diffusion for stream joins

Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware

The Caernarvon secure embedded operating system

ACM SIGOPS Operating Systems Review
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Event detection in sensor networks for modern oil fields

Proceedings of the second international conference on Distributed event-based systems
Replica placement for high availability in distributed stream processing systems

Proceedings of the second international conference on Distributed event-based systems
Streamsight: a visualization tool for large-scale streaming applications

Proceedings of the 4th ACM symposium on Software visualization
LeeWave: level-wise distribution of wavelet coefficients for processing kNN queries over distributed streams

Proceedings of the VLDB Endowment
Content-based filtering for efficient online materialized view maintenance

Proceedings of the 17th ACM conference on Information and knowledge management
Real-time new event detection for video streams

Proceedings of the 17th ACM conference on Information and knowledge management
SODA: an optimizing scheduler for large-scale stream-based distributed computer systems

Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
Distributed multi-layered workload synthesis for testing stream processing systems

Proceedings of the 40th Conference on Winter Simulation
CellJoin: a parallel stream join operator for the cell processor

The VLDB Journal — The International Journal on Very Large Data Bases
Implementing a high-volume, low-latency market data processing system on commodity hardware using IBM middleware

Proceedings of the 2nd Workshop on High Performance Computational Finance
Characterizing, constructing and managing resource usage profiles of system S applications: challenges and experience

Proceedings of the 18th ACM conference on Information and knowledge management
The design of distributed real-time video analytic system

Proceedings of the first international workshop on Cloud data management
Towards secure dataflow processing in open distributed systems

Proceedings of the 2009 ACM workshop on Scalable trusted computing
Tools and strategies for debugging distributed stream processing applications

Software—Practice & Experience
COLA: optimizing stream processing applications via graph partitioning

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Visualizing large-scale streaming applications

Information Visualization
COLA: optimizing stream processing applications via graph partitioning

Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Scalable performance of system S for extract-transform-load processing

Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Evaluation of streaming aggregation on parallel hardware architectures

Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Adaptive system anomaly prediction for large-scale hosting infrastructures

Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Scaling a monitoring infrastructure for the Akamai network

ACM SIGOPS Operating Systems Review
Design principles for developing stream processing applications

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Processing high data rate streams in System S

Journal of Parallel and Distributed Computing
NET-FLi: on-the-fly compression, archiving and indexing of streaming network traffic

Proceedings of the VLDB Endowment
From a stream of relational queries to distributed stream processing

Proceedings of the VLDB Endowment
Keeping track of 70,000+ servers: the akamai query system

LISA'10 Proceedings of the 24th international conference on Large installation system administration
SpamWatcher: a streaming social network analytic on the IBM wire-speed processor

Proceedings of the 5th ACM international conference on Distributed event-based system
Distributed middleware reliability and fault tolerance support in system S

Proceedings of the 5th ACM international conference on Distributed event-based system
A code generation approach for auto-vectorization in the SPADE compiler

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Processing flows of information: From data stream to complex event processing

ACM Computing Surveys (CSUR)
Real-time creation of bitmap indexes on streaming network data

The VLDB Journal — The International Journal on Very Large Data Bases
Evaluation of a high-volume, low-latency market data processing system implemented with IBM middleware

Software—Practice & Experience
Building user-defined runtime adaptation routines for stream processing applications

Proceedings of the VLDB Endowment
A model-based framework for building extensible, high performance stream processing middleware and programming language for IBM InfoSphere Streams

Software—Practice & Experience
Streaming workload generator for testing billing mediation platform in telecom industry

Proceedings of the Winter Simulation Conference
An evaluation of zookeeper for high availability in system S

Proceedings of the 5th ACM/SPEC international conference on Performance engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe the challenges of prototyping a reference application on System S, a distributed stream processing middleware under development at IBM Research. With a large number of stream PEs (Processing Elements) implementing various stream analytic algorithms, running on a large-scale, distributed cluster of nodes, and collaboratively digesting several multi-modal source streams with vastly differing rates, prototyping a reference application on System S faces many challenges. Specifically, we focus on our experience in prototyping DAC (Disaster Assistance Claim monitoring), a reference application dealing with multi-modal stream analytic and monitoring. We describe three critical challenges: (1) How do we generate correlated, multi-modal source streams for DAC? (2) How do we design and implement a comprehensive stream application, like DAC, from many divergent stream analytic PEs? (3) How do we deploy DAC in light of source streams with extremely different rates? We report our experience in addressing these challenges, including modeling a disaster claim processing center to generate correlated source streams, constructing the PE flow graph, utilizing programming supports from System S, adopting parallelism, and exploiting resource-adaptive computation.