Customizable parallel execution of scientific stream queries

Authors:
Milena Ivanova;Tore Risch
Affiliations:
Uppsala University, Sweden;Uppsala University, Sweden
Venue:
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Year:
2005

Citing 12
Cited 10

Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Principles of distributed database systems (2nd ed.)

Principles of distributed database systems (2nd ed.)
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
The SDSS skyserver: public access to the sloan digital sky server data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Volcano— An Extensible and Parallel Query Evaluation System

IEEE Transactions on Knowledge and Data Engineering
Algebraic Optimization of Computations over Scientific Databases

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Parallelizing User-Defined Functions in Distributed Object-Relational DBMS

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Gigascope: a stream database for network applications

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Highly available, fault-tolerant, parallel dataflows

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Tribeca: a system for managing large databases of network traffic

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Monitoring streams: a new class of data management applications

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Tuple routing strategies for distributed eddies

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

ViCo: an adaptive distributed video correlation system

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Highly scalable trip grouping for large-scale collective transportation systems

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Query-aware partitioning for monitoring massive network data streams

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Challenges in dependable internet-scale stream processing

Proceedings of the 2nd workshop on Dependable distributed data management
Optimistic parallelization support for event stream processing systems

Proceedings of the 5th Middleware doctoral symposium
Exploiting the power of relational databases for efficient stream processing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Mining large distributed log data in near real time

SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Scalable splitting of massive data streams

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Database support for processing complex aggregate queries over data streams

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Adaptive input admission and management for parallel stream processing

Proceedings of the 7th ACM international conference on Distributed event-based systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific applications require processing high-volume on-line streams of numerical data from instruments and simulations. We present an extensible stream database system that allows scalable and flexible continuous queries on such streams. Application dependent streams and query functions are defined through an object-relational model. Distributed execution plans for continuous queries are described as high-level data flow distribution templates. Using a generic template we define two partitioning strategies for scalable parallel execution of expensive stream queries: window split and window distribute. Window split provides operators for parallel execution of query functions by reducing the size of stream data units using application dependent functions as parameters. By contrast, window distribute provides operators for customized distribution of entire data units without reducing their size. We evaluate these strategies for a typical high volume scientific stream application and show that window split is favorable when expensive queries are executed on limited resources, while window distribution is better otherwise.